PyTorch VS TensorFlow who is the strongest? This is the result of running the library starred 15000+ Transformers

Click on the top " AI proper way ", select the "star" Public No.

Heavy dry goods, the first time served640?wx_fmt=jpeg


Author | Lysandre Debut
Translator | land from
Produced | AI technology base camp (ID: rgznai100)
 
[Lead] natural language processing model pre-trained library Transformers Transformer achieve the most advanced architecture for several NLP tasks, such as text classification, information extraction, and text generation and other questions, it is often used by researchers and companies, PyTorch and provide front-end implementation of TensorFlow.

What is PyTorch or TensorFlow more efficient training and operational Transformers model? On the show under different environmental performance were compared, the end result is that both on the CPU or GPU, the final performance of the two frameworks are similar.

Transformers library:
https://github.com/huggingface/transformers
 
Since TensorFlow release, we have been committed to the work product of the model, and it can be used in the TPU, and gradually adapt its performance.
 
This paper compares our model in several environments exhibited performance. On comparing the CPU and GPU PyTorch (1.3.0) and TensorFlow (2.0) the inference result. Out of consideration for a number of reasons, this article just about benchmarks and follow-up performance optimization first in a series of articles. In addition, we created a benchmark document in part, as a model for further research, and benchmarking them in a different environment, that part will continue to improve.

result


The average results of the tests are shown in the table in the next years will be part of the discussion of these results in detail.
       640?wx_fmt=png      The average time inference

Benchmarking Transformers Test results: https://url.cn/5hZHCll
 
N / A spreadsheet record represents the number of columns is insufficient memory or proper length. Transformer-XL (Transformer extra-long, Transformer extension application made in terms of length model design) not TorchScript result, because it is currently not a TorchScript (TorchScript from PyTorch creating a sequence of code optimization and model methods ) serialization.
 
In most cases, TensorFlow and PyTorch models on the GPU and CPU have been very similar results. The following is a statement of the results of relevant, not only is the comparison between the comparison between PyTorch and TensorFlow, also a model.

Measuring reasoning

Inference time is an important indicator when the model into production. To assess the reasoning time model, we compared different models for different batch and sequence length. We compared the appropriate batch size [1,2,4,8] and the sequence length [8,64,128,256,512,1024] . Batch size is still small, because we focus only infer setting. The maximum sequence length BERT and other similar models 512 or 256 (for CTRL), and therefore can not be measured on the last sequence length.
 
Here are our test results in two different environments:
 

  • In the CPU, using GCP n1-standard-32, it has 32 memory vCPU and 120GB. CPU type is Intel [email protected]:

 

  • On the GPU, with the machine using custom GCP 12 vCPU, 40GB V100 of a memory and GPU (16GB of VRAM) a;


Experimental details and best practices

To maximize performance, we conducted a further optimization:
 

  • Measured using the above Intel Xeon CPU with AVX and AVX2 extension, and need to be able to use these TensorFlow extend from after the source code is compiled, you can only do so;

 

  • We use tf.function to make sure that we do not use TensorFlow the eager mode and pre-tracking model;

 

  • We compared the library and does not rely on a tool depends on the library: PyTorch of TorchScript and with the GPU TensorFlow of XLA (automatic clustering), will be detailed later in these two tools;

 

  • We use native Python module timeit to measure the inferred time. Further, with repeat = 30 and number = 3 is performed for each experiment. 30 then obtains the average value, it will give a desired average inference time. Usually over 30 will get a very stable value of the results;

 

  • We do not use such a production environment such as TFX, and can call the method of measurement model we use is: PyTorch of nn.module.forward and TensorFlow of tf.keras.layers.layer.call ;

 

  • For TensorFlow and PyTorch, we will be very careful to use the appropriate version of CUDA;


discuss


PyTorch and TensorFlow


In most cases, these two frameworks will get similar results, compared with PyTorch, TensorFlow speed on the CPU usually slightly slower, but on the GPU speed is a little faster:
 

  • In all models, on the CPU, the average time estimation PyTorch is 0.748s, TensorFlow average time estimation is 0.823s;

 

  • In all models, on the GPU, the average time estimation PyTorch is 0.046s, TensorFlow average time estimation is 0.043s;


These results all the time to compare the estimation models by calculating an average. Therefore, the input value greater influence on the final outcome will be. When the input value is too large, PyTorch  will run out of memory; when calculating the average of these results will be removed from all metrics, as this will result to  PyTorch tilt.
 
During operation, PyTorch models tend to run out of memory sooner than TensorFlow Model: Distilled addition model, PyTorch batch size of 8 input and the sequence length reaches 1024 run out of memory.

TorchScript


TorchScript PyTorch is a method used to create the model sequence can be run on a different operating time, without the need Python dependencies, such as C ++ environment. Our test is by Python and reuse this tracking model to complete the tracking model in the same environment. We have to make sure before measuring its inference tracking model to pass through before performed in advance.
 
Disclaimer : Although TorchScript not to increase the operating speed of Python environment created, our results indicate that the use TorchScript tracking model can improve performance.
 
TorchScript seems to be very dependent on the model and the size (length sequence * batch size) input. For example, using the XLNet TorchScript can obtain permanent performance while using TorchScript on XLM the problem may occur because it will improve performance at a smaller input, but in the larger input but will reduce performance.
 
On average, the model used to infer the tracking speed TorchScript 20% faster than the speed estimation PyTorch same non-tracking model.

XLA

XLA is a linear algebra compiler, which can improve the speed TensorFlow model, but we can only use on the GPU. It is based on automatic clustering TensorFlow, the compiler of some sub-graph model.
 
The result has been improved in terms of speed and memory usage efficiency: most of the internal reference enabled after XLA run faster 1.15 times.
 
After enabling XLA, all our models have been improved performance. In some extreme cases, especially in the case of small input estimation time can be reduced to 70%.

Model and refined version

In this test model refined version outstanding performance, because it can quickly benchmark. Both Hugging Face-engineered model --DistilBERT and inference time DistilGPT-2 is reduced by half than their teacher model.

contribution


Due to the different benchmarks have different settings and the corresponding tools, these are not relying on an organization can achieve, and we also welcome the reference from the wider community. Github user @tlkh has been through the use of benchmarking the performance of AMP, XLA and distributed policy implemented on TensorFlow model made a significant contribution is also being added to the benchmark section of the document.


How to contribute


If you are willing to participate, we have set the theme templates on Github, will be easier to operate some of this. You can not open the theme has been the result of, or request and to open a benchmarking section of the document to add.

Benchmark script


And publish documents and benchmarks page article together, we have added a new section in the example script: benchmarks.py , which is used to obtain detailed results of the following script. It can be used XLA or TorchScript run benchmark tests on TensorFlow or PyTorch, and save the results to a CSV file them.

The next plan

The model is only a first step towards benchmarking performance. We believe that this introductory article may help to compare the current state of the model, especially when the difference between the PyTorch and TensorFlow in the study. When we delve into Transformers production, we will be committed to performance improvement.
 
For Pythorch and TensorFlow automation scripts, new architecture and custom TPU training, please pay close attention to the follow-up article.

Original link:

https://medium.com/huggingface/benchmarking-transformers-pytorch-and-tensorflow-e2917fb891c2


Recommended Reading

(Click on the title to jump to read)

640?wx_fmt=png

Guess you like

Origin blog.csdn.net/red_stone1/article/details/102735507