手里一堆人脸数据集需要处理,总不能手动,太费事了
于是使用一下百度的AI开放平台,地址如下:
https://cloud.baidu.com/product/face
使用起来非常简单,首先需要平台创建应用,此时就会有AK和SK
这个用于获取token,有了token就可以直接请求了
获取token的代码:
注意填上做自己的AK和SK
# encoding:utf-8
import requests
# client_id 为官网获取的AK, client_secret 为官网获取的SK
host = 'https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id=XXXXXX&client_secret=XXXXXXX'
response = requests.get(host)
if response:
print(response.json())
token是一长串字符串,复制保存好
下面 我 实现和这个一个需求,获取人脸的性别和是否戴眼镜,然后分为四组,分别将文件需要移动的文件移动到这个四个文件夹中,想实现自己的需求自己根据这个可以进行修改
import os
from shutil import move
import time
token="XXXXXXXXXXXXXXXXX"
_url = "https://aip.baidubce.com/rest/2.0/face/v3/detect" + "?access_token=" + token
headers = {'content-type': 'application/json'}
trs='A'
tr_ts=['train','test']
path='F:\\XXX\\'+tr_ts[0]+trs#trainA,trainB,testA,testB
ls=os.listdir(path)
ls.sort()
for i in ls:
start=time.time()
imgBase64=base64.b64encode(open(path+'\\'+i,'rb').read())
data={"image": imgBase64, "image_type": "BASE64","face_field":"age,gender,glasses"}
res= requests.post(_url,data=data,headers=headers).json()['result']['face_list'][0]
#print(res)
if (res['glasses']['type']!='none')&(res['gender']['type']=='male'):
move(path+'\\'+i,trs+'\\man_glasses\\')
print(time.time()-start)
elif (res['glasses']['type']!='none')&(res['gender']['type']=='female'):
move(path+'\\'+i,trs+'\\woman_glasses\\')
print(time.time()-start)
elif (res['glasses']['type']=='none')&(res['gender']['type']=='male'):
move(path+'\\'+i,trs+'\\man_noglasses\\')
print(time.time()-start)
else:move(path+'\\'+i,trs+'\\woman_noglasses\\');print(time.time()-start)
具体可以查看这个链接因为识别出来的内容很多,我这里只设置了这些,在face_field这个字段里面可以修改,可以看到我只需要age,gender,glasses
截取文档发的部分内容,可以看到非常多的属性,需要哪一个自己选择即可
不过,按照上面的代码速度很慢,因为免费的就每秒可以2次,而且网速会限制
于是有制作了多进程版:
# encoding:utf-8
import requests
import base64
import os
import time
import multiprocessing as mp
class MSP():
def __init__(self):
self.h='trainB'
self.token="XXXXX"
self._url = "https://aip.baidubce.com/rest/2.0/face/v3/detect" + "?access_token=" + self.token
self.headers = {'content-type': 'application/json'}
self.path='F:\\XXX\\'+self.h
self.ls=os.listdir(self.path)
self.ls.sort()
self.lres=[]
self.manager = mp.Manager
self.mp_lst = self.manager().list()
def post_func(self, i):
self.imgBase64=base64.b64encode(open(self.path+'\\'+i,'rb').read())
self.data={"image": self.imgBase64, "image_type": "BASE64","face_field":"age,gender,glasses"}
self.res= requests.post(self._url,data=self.data,headers=self.headers).json()['result']['face_list'][0]
self.res['name']=i
self.mp_lst.append(self.res)
time.sleep(0.1)
print(i)
def flow(self):
pool = mp.Pool(10)
for i in self.ls:
pool.apply_async(self.post_func, args=(i,))
pool.close()
pool.join()
if __name__ == '__main__':
start_time = time.time()
msp = MSP()
msp.flow()
f=open('XXX.txt','w')
f.write(str(msp.mp_lst))
f.close()
print(time.time() - start_time)
速度比之前快十倍左右,真的很快
将属性内容保存在一个txt,需要移动自己下代码读取文件内容即可
# encoding:utf-8
import os
from shutil import move
import time
#trainB
path='F:\\XXX\\trainB'
trs='B'
lres_trainB=open('XXXX.txt').read()
for res in eval(lres_trainB):
start=time.time()
i=res['name']
if (res['glasses']['type']!='none')&(res['gender']['type']=='male'):
move(path+'\\'+i,trs+'\\man_glasses\\')
print(time.time()-start)
elif (res['glasses']['type']!='none')&(res['gender']['type']=='female'):
move(path+'\\'+i,trs+'\\woman_glasses\\')
print(time.time()-start)
elif (res['glasses']['type']=='none')&(res['gender']['type']=='male'):
move(path+'\\'+i,trs+'\\man_noglasses\\')
print(time.time()-start)
else:move(path+'\\'+i,trs+'\\woman_noglasses\\');print(time.time()-start)
至此,这个过程结束了