最近在搞Image Caption,在Github上找了还多项目,不是环境不支持,就是跑不通.终于最后还是找到了一个可以在win10+python3+Tensorflow上跑通的项目,我只是做的前向预测,并未做训练,因为数据实在太多渣渣电脑跑不起来.当然中间也有很多坑,但不是很多.希望记下来以后可以用到,利己利人.

项目地址

https://github.com/coldmanck/show-attend-and-tell

[Python 3] Tensorflow implementation of “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”

使用方法

下载解压项目
安装依赖
- 按照项目上的提示连接把依赖装一下,其中nltk比较墨迹,尽量别下在系统盘,挺大的.
下载COCO数据集的annotations
- 我这里直接给出百度云的连接https://pan.baidu.com/s/1TkiFEsh2dRnX7qAsxuCQ4w下载解压,and put the file captions_train2014.json in the folder train. Similarly,put the file captions_val2014.json in the folder val.
下载与训练好的模型文件
- https://app.box.com/s/xuigzzaqfbpnf76t295h109ey9po5t8p
- 这个可能下不了,我给个百度云连接
前向预测
- 终于到正题了.Put some images in the folder test/images,在项目文件夹下运行:python main.py --phase=test --model_file='./models/289999.npy' --beam_size=3, 没错是289999,这是模型的名字.只需要把下载的模型289999.npy放到models文件下即可.The generated captions will be saved in the folder test/results.

报错与处理

开始会有很多print的错误,这是小事正常的,把print加括号就完事了,毕竟python2改过来的.
base_model.py的这一句image_name = os.path.splitext(image_name)[0]改一下,改为image_name = os.path.splitext(image_name)[0].split('/')[-1]否则会报找不到文件.
plt.savefig(os.path.join(config.test_result_dir,image_name+'_result.jpg'))改为plt.savefig(os.path.join(config.test_result_dir, image_name+'_result.png'))否则会报RGBA无法转为JPG.因为要保存的图片是四通道的,没法保存成jpg,我也不知道作者怎么跑通的.￣□￣｜｜
中间还有一些乱七八糟的小错,我也都记不清了,也就这两个比较头疼我印象深刻.有什么没提到的地方可以留言

记一次Image Caption使用过程

项目地址

使用方法

报错与处理

猜你喜欢