Paper Reading | Energy and Policy Considerations for Deep Learning in NLP

Thesis address: https://arxiv.org/abs/1906.02243v1

Author: Emma Strubell, Ananya Ganesh, Andrew McCallum

Institution: University of Massachusetts Amherst

 

Research issues:

A very green article. The author believes that although the accuracy of current NLP models has improved, they consume a lot of computing resources, and these resources bring a lot of energy consumption. Like the development of the model, it requires a lot of costs. Computing resources will consume a lot of carbon resources and generate a lot of carbon dioxide. This is not environmentally friendly. Therefore, the author quantifies the economic and environmental costs of some recent neural networks, hoping to alert NLP researchers.

 

Research methods:

The advancement of NLP has benefited from the advancement of neural network technology and hardware. Ten years ago, most NLP models could be carried out on personal computers, and now they have to be carried out on multiple GPUs and TPUs, which consumes a lot of financial resources.

Even if there is enough money, training a model will put a lot of pressure on the environment. The picture below shows the carbon dioxide emitted by the NLP model training compared to some other things.

 

Now, some people may refute that there are already many renewable resources. But in fact, most of the non-renewable resources used are as follows.

Research method: In the process of training the model, repeatedly query the NVIDIA system management interface to obtain GPU power consumption, and use the Intel interface to obtain CPU power consumption. Let p_c be the average power consumption of the CPU, p_r the average power consumption of the DRAM, and p_g the average power consumption of the GPU. The formula for total power consumption is:

 

1.58 here is the power usage effectiveness factor.

Then, based on the average carbon dioxide emissions of the US electricity consumption provided by the US Environmental Protection Agency, the power consumption is converted into carbon dioxide emissions:

 

 

 Model training cost:

The models used include Transformer, ELMO, BERT, GPT-2 four very popular models.

The experimental results are as follows:

 

The cost of cloud computing here is an estimate. As you can see, the cost is really expensive.

Here is an example. So et al. (2019) achieved a BLEU of 29.7 in English-German translation, which only increased by 0.1BLEU and cost at least US $ 150,000.

Model development cost:

Here we discuss the best long paper of EMNLP2018 as an example. (I took a look, it is a thesis of this article)

The training related to this model lasted 172 days, during which 123 hours of hyperparameter search was performed, and the GPU time spent was 9998 days, about 27 years. The following table is the cost they spent.

 

Finally, the author proposes the following initiatives:

1. Researchers report their training time and sensitivity to hyperparameters.

2. Researchers should use computing resources fairly.

3. Researchers should consider more computationally efficient hardware and algorithms.

Evaluation:

The author's suggestions still have some insights. It also puts forward that the current reliance on computing resources will make research into "the rich get richer". The author believes that the government should provide all researchers with fair resources. In general, the author puts forward the "Peach Blossom Source" of NLP research in his mind.

Guess you like

Origin www.cnblogs.com/bernieloveslife/p/12748603.html