Analysis of wtalc-pytorch source code

Analysis of wtalc-pytorch source code

论文名:W-TALC: Weakly-supervised Temporal Activity Localization and Classification

Code link: https://github.com/sujoyp/wtalc-pytorch

The main structure of the code is as follows:

python file function
main.py Main function
options.py Parameter configuration
video_dataset.py Data set classification and loading
model.py Weak supervision layer model
train.py Training code
test.py Test code
detectionMAP.py map
classificationMAP.py Classified map

1.opts.py is the parameter configuration.

parser = argparse.ArgumentParser(description='WTALC')
parser.add_argument('--lr', type=float, default=0.00001,help='learning rate (default: 0.0001)')
parser.add_argument('--batch-size', type=int, default=10, help='number of instances in a batch of data (default: 10)')
parser.add_argument('--model-name', default='weakloc', help='name to save model')
parser.add_argument('--pretrained-ckpt', default=None, help='ckpt for pretrained model')
parser.add_argument('--feature-size', default=2048, help='size of feature (default: 2048)')
parser.add_argument('--num-class', default=20, help='number of classes (default: )')
parser.add_argument('--dataset-name', default='Thumos14reduced', help='dataset to train on (default: )')
parser.add_argument('--max-seqlen', type=int, default=1200, help='maximum sequence length during training (default: 750)')
parser.add_argument('--Lambda', type=float, default=0.5, help='weight on Co-Activity Loss (default: 0.5)')
parser.add_argument('--num-similar', default=3, help='number of similar pairs in a batch of data  (default: 3)')
parser.add_argument('--seed', type=int, default=1, help='random seed (default: 1)')
parser.add_argument('--max-iter', type=int, default=100000, help='maximum iteration to train (default: 50000)')
parser.add_argument('--feature-type', type=str, default='I3D', help='type of feature to be used I3D or UNT (default: I3D)')

-Lr learning rate

–batch-size

--Model-name saved model name

--Pretrained-ckpt pre-trained model

--Feature-size feature dimension

--Num-class number of categories

--Dataset-name data set name

--Max-seqlen maximum sequence length during training

--The weight of Lambda Co-Activity Loss in the total loss

--Num-similar video similar pair in a batch

--Max-iter training period

--Feature-type The model used for the extracted features

2.video_dataset.py is the data set classification and loading part

2.1 init()

init () first obtains some configuration of this data set, and then calls the train_test_idx() function and classwise_feature_mapping() function.

2.2 train_test_idx()

The function of train_test_idx() is divided into training set and test set in the form of serial number

    def train_test_idx(self):
        for i, s in enumerate(self.subset):
            if s.decode('utf-8') == 'validation':   # Specific to Thumos14
                self.trainidx.append(i)  # 训练集序号
            else:
                self.testidx.append(i)  # 测试集序号

2.3 classwise_feature_mapping()

classwise_feature_mapping() classifies the dataset video

    def classwise_feature_mapping(self):
        for category in self.classlist:
            idx = [] # 一个类别的视频序号添加到一个idx中
            for i in self.trainidx:
                for label in self.labels[i]:
                    if label == category.decode('utf-8'):
                        idx.append(i); break;
            self.classwiseidx.append(idx)

2.4 load_data()

The main function of load_data() is to obtain similar video pairs, and finally return the feature matrix and label of 5 video pairs

    def load_data(self, n_similar=3, is_training=True):
        if is_training==True:
            features = []
            labels = []
            idx = []

            # Load similar pairs-->3对相似的视频对
            rand_classid = np.random.choice(len(self.classwiseidx), size=n_similar)
            
            # 加载一对相似的视频
            for rid in rand_classid:
                rand_sampleid = np.random.choice(len(self.classwiseidx[rid]), size=2)
                idx.append(self.classwiseidx[rid][rand_sampleid[0]])
                idx.append(self.classwiseidx[rid][rand_sampleid[1]])

            # idx = [6,]-->idx[10,]
            # Load rest pairs-->随机又生成2个视频对?并且不一定是相似的  有什么用
            rand_sampleid = np.random.choice(len(self.trainidx), size=self.batch_size-2*n_similar)

            for r in rand_sampleid:
                idx.append(self.trainidx[r])
            # 返回5个视频对的特征矩阵和label
            return np.array([utils.process_feat(self.features[i], self.t_max) for i in idx]), np.array([self.labels_multihot[i] for i in idx])

        else:
            labs = self.labels_multihot[self.testidx[self.currenttestidx]]
            feat = self.features[self.testidx[self.currenttestidx]]

            if self.currenttestidx == len(self.testidx)-1:
                done = True; self.currenttestidx = 0
            else:
                done = False; self.currenttestidx += 1
         
            return np.array(feat), np.array(labs), done

3.mdel.py is the model part

The function of model.py is mainly to implement the model of the weak supervision layer module (very simple, just look at the source code and the weak supervision formula of the paper).

[External link image transfer failed. The source site may have an anti-hotlink mechanism. It is recommended to save the image and upload it directly (img-sniWRnPJ-1603872318456)(C:\Users\shan\AppData\Roaming\Typora\typora-user-images\ image-20201028153100513.png)]

class Model(torch.nn.Module):
    def __init__(self, n_feature, n_class):
        super(Model, self).__init__()

        self.fc = nn.Linear(n_feature, n_feature)
        self.fc1 = nn.Linear(n_feature, n_feature)
        self.classifier = nn.Linear(n_feature, n_class)
        self.dropout = nn.Dropout(0.7)

        self.apply(weights_init)

        #self.train()

    def forward(self, inputs, is_training=True):

        x = F.relu(self.fc(inputs))
        if is_training:
            x = self.dropout(x)
        #x = F.relu(self.fc1(x))
        #if is_training:
        #    x = self.dropout(x)

        
        return x, self.classifier(x)

4.train.py is the training module

The main part of train.py is to find the multi-instance loss and Co-Activity Similiarity loss

4.1 MILL() is a multi-instance loss function

def MILL(element_logits, seq_len, batch_size, labels, device):
    ''' element_logits should be torch tensor of dimension (B, n_element, n_class),
         k should be numpy array of dimension (B,) indicating the top k locations to average over, 
         labels should be a numpy array of dimension (B, n_class) of 1 or 0
         return is a torch tensor of dimension (B, n_class) '''
    print('******************************')
    # [18 68 20 43 68 22 16 37 42 37]
    k = np.ceil(seq_len/8).astype('int32')
    labels = labels / torch.sum(labels, dim=1, keepdim=True)
    instance_logits = torch.zeros(0).to(device)
   

    for i in range(batch_size):
        # 取batch_size的第i批次的前seq_len[i]行,在第0个维度进行排序,取一个视频特征相对突出的前k行特征
        tmp, _ = torch.topk(element_logits[i][:seq_len[i]], k=int(k[i]), dim=0)  # [seq_len[i], 20]

        instance_logits = torch.cat([instance_logits, torch.mean(tmp, 0, keepdim=True)], dim=0)  # [1,20]
    # 套论文公式求出millloss 
    milloss = -torch.mean(torch.sum(Variable(labels) * F.log_softmax(instance_logits, dim=1), dim=1), dim=0)
  
    return milloss

4.2 CASL() is the Co-Activity Similiarity loss function

def CASL(x, element_logits, seq_len, n_similar, labels, device):
    ''' x is the torch tensor of feature from the last layer of model of dimension (n_similar, n_element, n_feature), 
        element_logits should be torch tensor of dimension (n_similar, n_element, n_class) 
        seq_len should be numpy array of dimension (B,)
        labels should be a numpy array of dimension (B, n_class) of 1 or 0 '''

    sim_loss = 0.
    n_tmp = 0.
    for i in range(0, n_similar*2, 2):
        # 使用softmax对每个视频类的激活分数沿时间轴进行标准化
        atn1 = F.softmax(element_logits[i][:seq_len[i]], dim=0)
        atn2 = F.softmax(element_logits[i+1][:seq_len[i+1]], dim=0)

        n1 = torch.FloatTensor([np.maximum(seq_len[i]-1, 1)]).to(device)
        n2 = torch.FloatTensor([np.maximum(seq_len[i+1]-1, 1)]).to(device)
        # 首先定义高、低attention区域的类的特征向量
        Hf1 = torch.mm(torch.transpose(x[i][:seq_len[i]], 1, 0), atn1)
        Hf2 = torch.mm(torch.transpose(x[i+1][:seq_len[i+1]], 1, 0), atn2)
        Lf1 = torch.mm(torch.transpose(x[i][:seq_len[i]], 1, 0), (1 - atn1)/n1)
        Lf2 = torch.mm(torch.transpose(x[i+1][:seq_len[i+1]], 1, 0), (1 - atn2)/n2)
	    # 使用余弦相似度来衡量两个特征向量之间的相似度
        d1 = 1 - torch.sum(Hf1*Hf2, dim=0) / (torch.norm(Hf1, 2, dim=0) * torch.norm(Hf2, 2, dim=0))
        d2 = 1 - torch.sum(Hf1*Lf2, dim=0) / (torch.norm(Hf1, 2, dim=0) * torch.norm(Lf2, 2, dim=0))
        d3 = 1 - torch.sum(Hf2*Lf1, dim=0) / (torch.norm(Hf2, 2, dim=0) * torch.norm(Lf1, 2, dim=0))
		# 为了加强上述两个性质,使用了rank hinge loss
        sim_loss = sim_loss + 0.5*torch.sum(torch.max(d1-d2+0.5, torch.FloatTensor([0.]).to(device))*Variable(labels[i,:])*Variable(labels[i+1,:]))
        sim_loss = sim_loss + 0.5*torch.sum(torch.max(d1-d3+0.5, torch.FloatTensor([0.]).to(device))*Variable(labels[i,:])*Variable(labels[i+1,:]))
        n_tmp = n_tmp + torch.sum(Variable(labels[i,:])*Variable(labels[i+1,:]))
    # 整个训练集的总损失
    sim_loss = sim_loss / n_tmp
    return sim_loss

5.test.py is the test module

The main formula of test.py is to call the dmAP() function and cmAP() to find the map and the classified map respectively

The reference link for the understanding of map is as follows: https://blog.csdn.net/better_boy/article/details/109334234

6.main.py

Finally, we will explain the main function and connect the above classes and functions in series.

6.1 Get parameter configuration

args = options.parser.parse_args()

6.2 Load data set

 dataset = Dataset(args)

6.3 Instantiate the model and parameters

model = Model(dataset.feature_size, dataset.num_class).to(device)
optimizer = optim.Adam(model.parameters(), lr=args.lr, weight_decay=0.0005)

6.4 Then start each epoch iteration, call the training function and the test function and save the model every 500 iterations

    for itr in range(args.max_iter):
       train(itr, dataset, args, model, optimizer, logger, device)
       if  itr % 5 == 0 and not itr == 0:
          torch.save(model.state_dict(), './ckpt/' + args.model_name + '.pkl')
          test(itr, dataset, args, model, logger, device)

7. Function call graph

Insert picture description here

Guess you like

Origin blog.csdn.net/better_boy/article/details/109334972