CelebA数据集详细属性统计

CelebA是香港中文大学提供的包含10,177个名人身份的202,599张人脸图片的数据集,其提供了5个点的人脸关键点坐标以及40个属性,可以在Large-scale CelebFaces Attributes (CelebA) Dataset下载.

各属性的含义请参考文末给出的链接,本文的代码用于统计其各属性的数量

rootdir="../"
imgdir=rootdir+"Img/img_celeba"
attributepath=rootdir+"Anno/list_attr_celeba.txt"
def stats():
    with open(attributepath)as f:
        numofimgs=int(f.readline())
        line=f.readline()
        items=line.split()
        attrs=[]
        for i in range(len(items)):
            attrs.append(items[i])
        #print(attrs)
        stats=[]
        for i in range(len(attrs)):
            stat=[]
            stat.append(0)
            stat.append(0)
            stats.append(stat)
        for i in range(numofimgs):
            line=f.readline()
            items=line.split()[1:]
            for j in range(len(attrs)):
                if items[j]=="1":
                    stats[j][0]+=1
                else:
                    stats[j][1]+=1
        for i in range(len(attrs)):
            print(attrs[i],stats[i][0],stats[i][1])

if __name__=="__main__":
    stats()

结果如下:

5_o_Clock_Shadow 22516 180083
Arched_Eyebrows 54090 148509
Attractive 103833 98766
Bags_Under_Eyes 41446 161153
Bald 4547 198052
Bangs 30709 171890
Big_Lips 48785 153814
Big_Nose 47516 155083
Black_Hair 48472 154127
Blond_Hair 29983 172616
Blurry 10312 192287
Brown_Hair 41572 161027
Bushy_Eyebrows 28803 173796
Chubby 11663 190936
Double_Chin 9459 193140
Eyeglasses 13193 189406
Goatee 12716 189883
Gray_Hair 8499 194100
Heavy_Makeup 78390 124209
High_Cheekbones 92189 110410
Male 84437 118162
Mouth_Slightly_Open 97942 104657
Mustache 8417 194182
Narrow_Eyes 23329 179270
No_Beard 169158 33441
Oval_Face 57567 145032
Pale_Skin 8701 193898
Pointy_Nose 56210 146389
Receding_Hairline 16163 186436
Rosy_Cheeks 13315 189284
Sideburns 11449 191150
Smiling 97669 104930
Straight_Hair 42222 160377
Wavy_Hair 64744 137855
Wearing_Earrings 38276 164323
Wearing_Hat 9818 192781
Wearing_Lipstick 95715 106884
Wearing_Necklace 24913 177686
Wearing_Necktie 14732 187867
Young 156734 45865

不难发现有些属性分布很不均衡,达到了10:1的比例,而男女还是相对要均衡一些的,为84437:118162,可以提取出来作为性别识别的数据.

参考:

CelebA数据集详细介绍及其属性提取源代码

猜你喜欢

转载自blog.csdn.net/minstyrain/article/details/83142056
今日推荐