还是老套路来搞新事物:
一. 安装TF (不是GPU版,也不是TPU版,更不是AI版)。我的系统是Centos7.5,选择官网linux版guide 如下:
https://www.tensorflow.org/install/install_linux
有代码就有江湖,注定不能好好安装的。填坑如下:
1、第3步 命令 easy_install -U pip 网络老超时,解决方法如下:
easy_install -i http://pypi.douban.com/simple/ -U pip
其实我的pip版本也刚好够的只是升级到更高版本。
升级前:
$ pip -V
pip 9.0.1 from /home/xxx/test/tensorflow/lib/python2.7/site-packages (python 2.7)
升级后:
$ pip -V
pip 18.0 from /home/xxx/test/tensorflow/lib/python2.7/site-packages/pip-18.0-py2.7.egg/pip (python 2.7)
2、第5步 “ pip install --upgrade tensorflow ”出错,这次加 -i http://pypi.douban.com/simple/ 也没救了。
2.1)首先是在aarch64平台上的,需要修改.whl包路径。 参考ARM 64-bit上安装Tensorflow框架(https://blog.csdn.net/qq_31261509/article/details/79835136),可知路径应该为tensorflow-1.8.0-cp27-none-linux_aarch64.whl。于是命令改为:pip install --upgrade https://download.tensorflow.google.cn/linux/cpu/tensorflow-1.8.0-cp27-none-linux_aarch64.whl。但是download.tensorflow.google.cn下载速率只有1.xKB/s,直接被timeout了。
2.2) 想办法提网速,参考访问github加速(Windows + linux)(https://blog.csdn.net/senver_wen/article/details/80834652)。于是命令更新为pip install --upgrade https://github.com/lhelontra/tensorflow-on-arm/releases/download/v1.8.0/tensorflow-1.8.0-cp27-none-linux_aarch64.whl
#加速github访问
a. 修改/etc/hosts
13.229.188.59 www.github.com
151.101.228.133 assets-cdn.github.com
151.101.73.194 github.global.ssl.fastly.net
b. 安装nscd 和 刷新DNS (CentOS7.5)
yum install nscd
systemctl restart nscd
2.3) 在tensorflow-1.8.0-cp27-none-linux_aarch64.whl下载完后,很快就报错了。
Collecting numpy>=1.13.3 (from tensorflow==1.8.0)
Could not find a version that satisfies the requirement numpy>=1.13.3 (from tensorflow==1.8.0) (from versions: )
No matching distribution found for numpy>=1.13.3 (from tensorflow==1.8.0)
推测是numpy安装的问题。于是手动安装,pip install numpy 。继续用站长工具(http://tool.chinaz.com/dns )找出files.pythonhosted.org 合适的DNS。如下:
151.101.73.63 files.pythonhosted.org
重复2.2 中的 a. 和 b. 步骤,于是很快安装好了numpy(这厮是python的一个包,用于科学计算的).
2.4)Finally,看到了如下提示信息:
Successfully installed absl-py-0.3.0 astor-0.7.1 backports.weakref-1.0.post1 bleach-1.5.0 enum34-1.1.6 funcsigs-1.0.2 futures-3.2.0 gast-0.2.0 grpcio-1.14.1 html5lib-0.9999999 markdown-2.6.11 mock-2.0.0 pbr-4.2.0 protobuf-3.6.0 six-1.11.0 tensorboard-1.8.0 tensorflow-1.8.0 termcolor-1.1.0 werkzeug-0.14.1
(tensorflow) [xxxtest-3 bin]$
2.5)安装之后,打个Hello验证
https://www.tensorflow.org/install/install_linux#ValidateYourInstallation
# Python
import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))
结果提示 需要 GLIBC_2.23 版本,我的系统目前是GLIBC2.17.
ImportError: /lib64/libm.so.6: version `GLIBC_2.23' not found (required by /home/xx/test/tensorflow-v1.8/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so)
尼玛,安装的时候不提示,等到我运行的时候才跳出来。
考虑到系统有其他人在用,尽量不升级libc.于是试着降低tensorflow的版本到1.4.结果还是一个鸟样。
ImportError: /lib64/libm.so.6: version `GLIBC_2.23' not found (required by /home/xxx/test/tensorflow-v1.4/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so)
2.6)或者尝试升级glibc到2.23.
参考帖子:https://blog.csdn.net/qq708986022/article/details/77896791
2.7)VM上安装Tensorflow (还是自己的环境靠谱)
VM上的Ubuntu的是python3,还是选择virtualenv安装,到了执行第5步 pip3 install --upgrade tensorflow 后,自动去下载版本tensorflow-1.10.0-cp36-cp36m-manylinux1_x86_64.whl
log如下:
Downloading https://files.pythonhosted.org/packages/ee/e6/a6d371306c23c2b01cd2cb38909673d17ddd388d9e4b3c0f6602bfd972c8/tensorflow-1.10.0-cp36-cp36m-manylinux1_x86_64.whl (58.4MB)
但依旧超时,于是想是不是可以设置pip3超时设置。
网上search如下:
$cat ~/.pip/pip.config
[global]
timeout = 6000
index-url = http://e.pypi.python.org/simple
trusted-host = pypi.douban.com
[install]
use-mirrors = true
mirrors = http://e.pypi.python.org
再一次提示安装成功。
Successfully built termcolor absl-py gast
launchpadlib 1.10.6 requires testresources, which is not installed.
Installing collected packages: werkzeug, setuptools, protobuf, numpy, markdown, tensorboard, astor, termcolor, absl-py, gast, grpcio, tensorflow
Found existing installation: setuptools 40.0.0
Uninstalling setuptools-40.0.0:
Successfully uninstalled setuptools-40.0.0
Found existing installation: protobuf 3.0.0
Not uninstalling protobuf at /usr/lib/python3/dist-packages, outside environment /home/xxx/tensorflow
Can't uninstall 'protobuf'. No files were found to uninstall.
Found existing installation: numpy 1.15.0
Uninstalling numpy-1.15.0:
Successfully uninstalled numpy-1.15.0
Successfully installed absl-py-0.3.0 astor-0.7.1 gast-0.2.0 grpcio-1.14.1 markdown-2.6.11 numpy-1.14.5 protobuf-3.6.0 setuptools-39.1.0 tensorboard-1.10.0 tensorflow-1.10.0 termcolor-1.1.0 werkzeug-0.14.1
(tensorflow) -VirtualBox:~/tensorflow$
按官网的验证安装,跑一个Hello TensorFlow,成功了。
$ python
Python 3.6.5 (default, Apr 1 2018, 05:46:30)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
2018-08-13 17:51:02.640633: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
>>> print(sess.run(hello))
b'Hello, TensorFlow!'
>>>
到此处,tensorflow安装成功!!!
官网python3.6 支持CPU/GPU 的包如下:
仅支持 CPU:https://download.tensorflow.google.cn/linux/cpu/tensorflow-1.8.0-cp36-cp36m-linux_x86_64.whl
支持 GPU:https://download.tensorflow.google.cn/linux/gpu/tensorflow_gpu-1.8.0-cp36-cp36m-linux_x86_64.whl
二. 跑一把TF的Benchmark
https://www.tensorflow.org/performance/benchmarks
1. 分析TF的框架及学习一个关键部分、感兴趣的部分,当然TF的原生态语言-python 也需要熟悉(TF目前还支持C/Java/go语言)
三、目前问题如下:
1. Tensorflow 怎么关联大数据?使用什么方法作为输入?
参考部署https://www.tensorflow.org/deploy/,官网:How to run TensorFlow on Hadoop, which has a highly self-explanatory title.(https://www.tensorflow.org/deploy/hadoop)
filename_queue = tf.train.string_input_producer([
"hdfs://namenode:8020/path/to/file1.csv",
"hdfs://namenode:8020/path/to/file2.csv",
])
2. Tensorflow 怎么布局在分布式系统上?
参考部署https://www.tensorflow.org/deploy/,官网:Distributed TensorFlow, which explains how to create a cluster of TensorFlow servers. (https://www.tensorflow.org/deploy/distributed)
tf.train.ClusterSpec construction |
Available tasks |
---|---|
tf.train.ClusterSpec({"local": ["localhost:2222", "localhost:2223"]}) |
/job:local/task:0 |
tf.train.ClusterSpec({ "worker": [ "worker0.example.com:2222", "worker1.example.com:2222", "worker2.example.com:2222" ], "ps": [ "ps0.example.com:2222", "ps1.example.com:2222" ]}) |
3. Tensorflow 怎么跑在 CPU + AI 芯片上?目前是CPU 或者 CPU+GPU 或者 CPU+TPU?
四、 深入学习:
1. TensorFlow 功能广泛,但是主要用于构建深度神经网络模型(DNN各种模型介绍:https://blog.csdn.net/scutjy2015/article/details/74170794)。要开始使用 TensorFlow,最简单的方法就是使用 Eager Execution。
官网的guide很全----https://www.tensorflow.org