Linear encoding of sparse vectors is widely popular, but is commonly data-independent – missing any possible extra (but a priori unknown) structure beyond sparsity. In this paper, we present a new method to learn linear encoders that adapt to data, while still performing well with the widely used l1 decoder. The convex l1 decoder prevents gradient propagation as needed in standard gradient-based training. Our method is based on the insight that unrolling the convex decoder into T projected subgradient steps can address this issue. Our method can be seen as a data-driven way to learn a compressed sensing measurement matrix. We compare the empirical performance of 10 algorithms over 6 sparse datasets (3 synthetic and 3 real). Our experiments show that there is indeed additional structure beyond sparsity in the real datasets; our method is able to discover it and exploit it to create excellent reconstructions with fewer measurements (by a factor of 1.1-3x) compared to the previous state-of-the-art methods. We illustrate an application of our method in learning label embeddings for extreme multi-label classification, and empirically show that our method is able to match or outperform the precision scores of SLEEC, which is one of the state-of-the-art embedding-based approaches.
Design a measurement matrix A∈Rm×d, such that for an ill-posed problem y=Ax where x∈Rd and y∈Rm, and m<d. The problem of designing measurement matrices and reconstruction algorithms that recover sparse vectors from linear observations is called Compressed Sensing (CS), Sparse Approximation or Sparse Recovery Theory.
The problem can be solved as follows:
x′∈Rdargmin∥x′∥0s.t.Ax′=y
But it's a NP-Hard problem. With some properties such as Restricted Isometry Property (RIP) or the nullspace condition (NSP), l0 norm can be relaxed to l1 norm:
D(A,y)=x′∈Rdargmin∥x′∥1s.t.Ax′=y
The authors interested in vectors that are not only sparse but have additional structure in their support. They propose a data-driven algorithm that learns a good linear measurement matrix A from data samples. Their linear measurements are subsequently decoded with the l1-minimization to estimate the unknown vector x.
Our method is an autoencoder for sparse data, with a linear encoder (the measurement matrix) and a complex non-linear decoder that approximately solves an optimization problem.
PCA is a data-driven dimensionality reduction method and an autoencoder with encoder and decoder all linear, and is provides the lowest MSE. Linear encoder has advantages, 1) easy to compute matrix-vector multiplication; 2) easy to interpret as every column of the encoding matrix can be viewed as feature embedding. Arora et al. found that pre-trained word embeddings such as GloVe and word2vec formed a good measurement matrices for text data, which need fewer mearusements than random matrces when used with l1-minimization.
Given n sparse samples x1,x2,…,xn∈Rd, our problem is to find the best A:
Compressed sensing (CS) provides an elegant framework for recovering sparse signals from compressed measurements. For example, CS can exploit the structure of natural images and recover an image from only a few random measurements. CS is flexible and data efficient, but its application has been restricted by the strong assumption of sparsity and costly reconstruction process. A recent approach that combines CS with neural network generators has removed the constraint of sparsity, but reconstruction remains slow. Here we propose a novel framework that significantly improves both the performance and speed of signal recovery by jointly training a generator and the optimisation process for reconstruction via metalearning. We explore training the measurements with different objectives, and derive a family of models based on minimising measurement errors. We show that Generative Adversarial Nets (GANs) can be viewed as a special case in this family of models. Borrowing insights from the CS perspective, we develop a novel way of improving GANs using gradient information from the discriminator.
The goal of compressed sensing is to estimate a vector from an underdetermined system of noisy linear measurements, by making use of prior knowledge on the structure of vectors in the relevant domain. For almost all results in this literature, the structure is represented by sparsity in a well-chosen basis. We show how to achieve guarantees similar to standard compressed sensing but without employing sparsity at all. Instead, we suppose that vectors lie near the range of a generative model G:Rk⇒Rn. Our main theorem is that, if G is L-Lipschitz, then roughly O(klogL) random Gaussian measurements suffice for an l1/l2 recovery guarantee. We demonstrate our results using generative models from published variational autoencoder and generative adversarial networks. Our method can use 5-10x fewer measurements than Lasso for the same accuracy.
In recent years, neural network approaches have been widely adopted for machine learning tasks, with applications in computer vision. More recently, unsupervised generative models based on neural networks have been successfully applied to model data distributions via low-dimensional latent spaces. In this paper, we use Generative Adversarial Networks (GANs) to impose structure in compressed sensing problems, replacing the usual sparsity constraint. We propose to train the GANs in a task-aware fashion, specifically for reconstruction tasks. We also show that it is possible to train our model without using any (or much) non-compressed data. Finally, we show that the latent space of the GAN carries discriminative information and can further be regularized to generate input features for general inference tasks. We demonstrate the effectiveness of our method on a variety of reconstruction and classification problems.
这篇文章是在 [Bora et al.][1] 的基础上进行的工作。[Bora et al.] 使用生成模型替代了压缩感知问题中的稀疏性假设,并且证明在生成模型满足一定条件,且测量矩阵为独立同分布的随机高斯分布时,在很高的概率下有
∥G(z^)−x∥2≤6zmin∥G(z)−x∥2+3∥ζ∥2+2ϵ(1)
[Bora et al.] 的文章中,损失函数取为
loss(z)=∥y−AG(z)∥22+λprior∥z∥22(2)
[Kabkab et al.] 首先证明了,在一定的假设下,有
t→∞limEx[zmin∥Gt(z)−x∥2]=0(3)
这表明 (1) 式右侧非常小。但以上定理的假设太严格,很难达到,因此 [Kabkab et al.] 考虑进行任务感知的 GAN 训练,允许生成器 G 对压缩感知重建任务进行优化。[Kabkab et al.] 对 Goodfellow 提出的的 GAN 的训练过程进行了修改,增加了对 (2) 式损失函数进行 GD 优化的步骤,再利用得到的隐变量 z 继续 D 与 G 的优化。通过这种方式将压缩感知引入到 GAN 的训练中,使用压缩感知指导 GAN 的隐变量的筛选,从而得到更好的 GAN。
更进一步地,考虑到在压缩感知的任务中对采样过程进行压缩,非压缩的数据往往难以获得,因此提出基于一小部分的非压缩数据及大部分的压缩数据作为训练集进行训练。此算法训练两个判别器和一个生成器,第一个判别器分辨实际的训练数据 x 与生成数据 G(z),第二个判别器分辨实际的压缩训练数据 y 与生成数据 AG(z)。这种对原始GAN的训练算法的变化并不会影响生成器的表示能力。
因为隐变量 z 比非压缩数据 x 与压缩数据 y 的维度更低,作者认为可以将 z 作为特征应用到分类任务中,并为这类任务引入了对比损失函数 (Contrastive Loss)。
We study image inverse problems with invertible generative priors, specifically normalizing flow models. Our formulation views the solution as the Maximum a Posteriori (MAP) estimate of the image given the measurements. Our general formulation allows for non-linear differentiable forward operators and noise distributions with long-range dependencies. We establish theoretical recovery guarantees for denoising and compressed sensing under our framework. We also empirically validate our method on various inverse problems including compressed sensing with quantized measurements and denoising with dependent noise patterns.
Ill-posed linear inverse problems appear in many image processing applications, such as deblurring, super- resolution and compressed sensing. Many restoration strategies involve minimizing a cost function, which is composed of fidelity and prior terms, balanced by a regularization parameter. While a vast amount of research has been focused on different prior models, the fidelity term is almost always chosen to be the least squares (LS) objective, that encourages fitting the linearly transformed optimization variable to the observations. In this paper, we examine a different fidelity term, which has been implicitly used by the recently proposed iterative denoising and backward projections (IDBP) framework. This term encourages agreement between the projection of the optimization variable onto the row space of the linear operator and the pseudo- inverse of the linear operator (“back-projection”) applied on the observations. We analytically examine the difference between the two fidelity terms for Tikhonov regularization and identify cases (such as a badly conditioned linear operator) where the new term has an advantage over the standard LS one. Moreover, we demonstrate empirically that the behavior of the two induced cost functions for sophisticated convex and non-convex priors, such as total-variation, BM3D, and deep generative models, correlates with the obtained theoretical analysis