The ICML is now already over for two weeks, but I still wanted to write about my reading list, as there have been some quite interesting papers (

the proceedings are here). Also, I haven't blogged in ages, for which I really have no excuse ;)

There are three topics that I am particularly interested in, which got a lot of attention at this years ICML: Neural networks, feature expansion and kernel approximation, and Structured prediction.

But first:

James Bergstra, Daniel Yamins, David Cox

This is the newest in a series of papers by James Bergstra on hyperparamter optimization. I quite enjoy his work and his hyperopt software is in active use in my lab. In particular in computer vision applications, there is so much engineering, that it is very hard to separate research contributions from engineering contributions. This paper shows 1) how important engineering is and 2) how far automatization of the engineering part can really go.

##
Neural Networks

Now, let's come to the somewhat most unlikely candidate, neural networks.

They
gained a lot of attention in the more machine-learny circles in the
last couple of years. Still I was a bit surprised how many - in
particular very empirical papers - made it to ICML.

Li Wan, Matthew Zeiler, Sixin Zhang, Yann Le Cun, Rob Fergus

One of the zoo of follow-ups on the drop-out work by Hinton, this paper suggests setting weights to zero, instead of hidden unit activations. It achieves better accuracy and is more efficient than drop-out.

Ian Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, Yoshua Bengio
One of the most impressive follow-ups on the drop-out work, this paper demonstrates how to combine drop-out with a maximum nonlinearity.
That's right. The only nonlinearity is the maximum over a group of hidden units.
I feel this is pretty innovative and the results speak for themselves.
The authors argue that the max non-linearity allows the network to learn a linear approximation of any convex activation function. Unfortunately, it is not really clear from the paper how much of the performance can be attributed to the max non-linearity, as there are no results without max-out.

Ilya Sutskever, James Martens, George Dahl, Geoffrey Hinton
This work investigates relations between momentum and Nesterov's accelerated gradients. It argues that together with the right initialization, learning with momentum can yield to much better models.
##
Kernel Approximation and Feature Extraction

Alex Gittens, Michael Mahoney
This work compares sample based and projection based methods for low rank approximations. I haven't looked into the details yet, but I'm a big fan of the Nystroem method for kernel approximations, so I will definitely see what's in there.
Krishnakumar Balasubramanian, Kai Yu, Guy Lebanon
The authors propose a new sparse coding framework using non-parametric kernel smoothing. They provide generalization bounds for sparse dictionary learning and demonstrate benefits compared to standard sparse coding and Locally Linear Coding.
##
Structured Prediction

Jeremy Jancsary, Sebastian Nowozin, Carsten Rother

This is quite exciting work by the folks from MSRC which I met during my internship. They propose to use a QP relaxation for learning structured prediction. Basically they parametrize the problem in a way that inference via the QP relaxation is always convex and learn this restricted family. I only skimmed it yet ;)

###

Philipp Kraehenbuehl, Vladlen Koltun

This is a continuation of the authors work on dense random fields for semantic image segmentation. It is another example of "learning for inference". In their previous work, it was shown that mean-field inference can be implemented efficiently by convolutions in certain cases. Here, the authors show how it is possible to directly minimize the loss of the prediction produced by mean-field inference.

There are several more papers on optimization for inference and / or learning,

but I can't possibly list them all. There are also some interesting theory papers, for example on random forests.

Thanks, Andy for mentioning my paper :)

ReplyDeleteIt seems like I'll have to go over the list of paper at ICML 2013 again. Although I was there myself, just the amount of talks and posters was a bit too overwhelming.

It is really overwhelming. I wish I was there.

DeleteI went over the proceedings three times and still found new interesting stuff I over read in the first two reads.

The last word of Cho's paper title are missing, though

ReplyDeleteI've referenced the Gitten's et al. paper multiple times in my dissertation. Great work.

ReplyDeleteThis comment has been removed by the author.

ReplyDelete