# PNAS

2018-ECCV-Progressive Neural Architecture Search

• Johns Hopkins University(霍普金斯大学) && Google AI && Stanford
• GitHub：300+ stars
• Citation：504

## Motivation

current techniques usually fall into one of two categories: evolutionary algorithms(EA) or reinforcement learning(RL).

Although both EA and RL methods have been able to learn network structures that outperform manually designed architectures, they require significant computational resources.

## Contribution

we describe a method that requiring 5 times fewer model evaluations during the architecture search.

We propose to use heuristic search to search the space of cell structures, starting with simple (shallow) models and progressing to complex ones, pruning out unpromising structures as we go.

Since this process is expensive, we also learn a model or surrogate function(替代函数) which can predict the performance of a structure without needing to training it.

First, the simple structures train faster, so we get some initial results to train the surrogate quickly.

Second, we only ask the surrogate to predict the quality of structures that are slightly different (larger) from the ones it has seen

Third, we factorize(分解) the search space into a product(乘积) of smaller search spaces, allowing us to potentially search models with many more blocks.

we show that our approach is 5 times more efficient than the RL method of [41] in terms of number of models evaluated, and 8 times faster in terms of total compute.

## Method

### Search Space

we first learn a cell structure, and then stack this cell a desired number of times, in order to create the final CNN.

Operator的选择空间有8种操作。

we stack a predefined number of copies of the basic cell (with the same structure, but untied weights 不继承权重 ), using either stride 1 or stride 2, as shown in Figure 1 (right).

The number of stride-1 cells between stride-2 cells is then adjusted accordingly with up to N number of repeats.

Normal cell（stride=1）的数量，取决于N（超参）。

we only use one cell type (we do not distinguish between Normal and Reduction cells, but instead emulate a Reduction cell by using a Normal cell with stride 2),

Many previous approaches directly search in the space of full cells, or worse, full CNNs.

While this is a more direct approach, we argue that it is difficult to directly navigate in an exponentially large search space, especially at the beginning where there is no knowledge of what makes a good model.

### Performance Prediction with Surrogate Model

Requirement of Predictor

• Handle variable-sized inputs（接受可变输入）
• Correlated with true performance（预测值与真实值得相关性）
• Sample efficiency（简单高效）
• The requirement that the predictor be able to handle variable-sized strings immediately suggests the use of an RNN.

Two Predictor method

RNN and MLP（多层感知机）

However, since the sample size is very small, we fit an ensemble of 5 predictors, We observed empirically that this reduced the variance of the predictions.

## Experiments

### Performance of the Surrogate Predictors

we train the predictor on the observed performance of cells with up to b blocks, but we apply it to cells with b+1 blocks.

We therefore consider predictive accuracy both for cells with sizes that have been seen before (but which have not been trained on), and for cells which are one block larger than the training data.

randomly select K = 256 models (each of size b) from $$U_{b,1 :R}$$to generate a training set $$S_{b,t,1:K}$$;

We now use this random dataset to evaluate the performance of the predictors using the pseudocode(伪代码) in Algorithm 2, where A(H) returns the true validation set accuracies of the models in some set H.

A(H) 返回cell的集合H训练后真实的准确率。

We see that the predictor performs well on models from the training set, but not so well when predicting larger models. However, performance does increase as the predictor is trained on more (and larger) cells.

We see that for predicting the training set, the RNN does better than the MLP, but for predicting the performance on unseen larger models (which is the setting we care about in practice), the MLP seems to do slightly better.

RNN方法的预测器在训练集{B=b}上表现更好，MLP在较大的数据集{B=b+1}上表现更好(我们关心的)

## Conclusion

The main contribution of this work is to show how we can accelerate the search for good CNN structures by using progressive search through the space of increasingly complex graphs

combined with a learned prediction function to efficiently identify the most promising models to explore.

The resulting models achieve the same level of performance as previous work but with a fraction of the computational cost.