CelebA是香港中文大学提供的包含10,177个名人身份的202,599张人脸图片的数据集,其提供了5个点的人脸关键点坐标以及40个属性,可以在Large-scale CelebFaces Attributes (CelebA) Dataset下载.
各属性的含义请参考文末给出的链接,本文的代码用于统计其各属性的数量
rootdir="../"
imgdir=rootdir+"Img/img_celeba"
attributepath=rootdir+"Anno/list_attr_celeba.txt"
def stats():
with open(attributepath)as f:
numofimgs=int(f.readline())
line=f.readline()
items=line.split()
attrs=[]
for i in range(len(items)):
attrs.append(items[i])
#print(attrs)
stats=[]
for i in range(len(attrs)):
stat=[]
stat.append(0)
stat.append(0)
stats.append(stat)
for i in range(numofimgs):
line=f.readline()
items=line.split()[1:]
for j in range(len(attrs)):
if items[j]=="1":
stats[j][0]+=1
else:
stats[j][1]+=1
for i in range(len(attrs)):
print(attrs[i],stats[i][0],stats[i][1])
if __name__=="__main__":
stats()
结果如下:
5_o_Clock_Shadow 22516 180083
Arched_Eyebrows 54090 148509
Attractive 103833 98766
Bags_Under_Eyes 41446 161153
Bald 4547 198052
Bangs 30709 171890
Big_Lips 48785 153814
Big_Nose 47516 155083
Black_Hair 48472 154127
Blond_Hair 29983 172616
Blurry 10312 192287
Brown_Hair 41572 161027
Bushy_Eyebrows 28803 173796
Chubby 11663 190936
Double_Chin 9459 193140
Eyeglasses 13193 189406
Goatee 12716 189883
Gray_Hair 8499 194100
Heavy_Makeup 78390 124209
High_Cheekbones 92189 110410
Male 84437 118162
Mouth_Slightly_Open 97942 104657
Mustache 8417 194182
Narrow_Eyes 23329 179270
No_Beard 169158 33441
Oval_Face 57567 145032
Pale_Skin 8701 193898
Pointy_Nose 56210 146389
Receding_Hairline 16163 186436
Rosy_Cheeks 13315 189284
Sideburns 11449 191150
Smiling 97669 104930
Straight_Hair 42222 160377
Wavy_Hair 64744 137855
Wearing_Earrings 38276 164323
Wearing_Hat 9818 192781
Wearing_Lipstick 95715 106884
Wearing_Necklace 24913 177686
Wearing_Necktie 14732 187867
Young 156734 45865
不难发现有些属性分布很不均衡,达到了10:1的比例,而男女还是相对要均衡一些的,为84437:118162,可以提取出来作为性别识别的数据.
参考: