读书人

What is the difference between L1 a

发布时间: 2013-10-27 15:21:49 作者: rapoo

What is the difference between L1 and L2 regularization?

今天讨论班一个师姐讲到L1 norm还有L2 norm 的regularization问题,还有晚上和一个同学也讨论到这个问题,具体什么时候用L1,什么时候用L2,论文上写道一般当成分中有几个成分是principle factor的时候我们会使用L1 norm penalty,但是为什么L1会有这个效果。

一个网上的讨论:

http://www.quora.com/Machine-Learning/What-is-the-difference-between-L1-and-L2-regularization

发现这个网站不错,经常讨论一些机器学习相关的问题。

There are many ways to understand the need for and approaches to regularization. I won't attempt to summarize the ideas here, but you should explore statistics or machine learning literature to get a high-level view. In particular, you can view regularization as a prior on the distribution from which your data is drawn (most famously Gaussian for least-squares), as a way to punish high values in regression coefficients, and so on. I prefer a more naive but somewhat more understandable (for me!) viewpoint.

Let's say you wish to solve the linear problem What is the difference between L1 and L2 regularization. Here, What is the difference between L1 and L2 regularization is a matrix and What is the difference between L1 and L2 regularization is a vector. We spend lots of time in linear algebra worrying about the exactly-and over-determined cases, in which What is the difference between L1 and L2 regularization is at least as tall as it is wide, but instead let's assume the system is under-determined, e.g. What is the difference between L1 and L2 regularization is wider than it is tall, in which case there generally exist infinitely many solutions to the problem at hand.

This case is troublesome, because there are multiple possible What is the difference between L1 and L2 regularization's you might want to recover. To choose one, we can solve the following optimization problem:

MINIMIZE What is the difference between L1 and L2 regularization WITH RESPECT TO What is the difference between L1 and L2 regularization

This is called the least-norm solution. In many ways, it says "In the absence of any other information, I might as well make What is the difference between L1 and L2 regularization small."

But there's one thing I've neglected in the notation above: The norm What is the difference between L1 and L2 regularization. It turns out, this makes all the difference!

In particular, consider the vectors What is the difference between L1 and L2 regularization and What is the difference between L1 and L2 regularization. We can compute two possible norms:
读书人网 >其他相关

热点推荐