1.kitti-->voc-->darknet 数据集制作

1.1kitti数据集下载

链接:https://pan.baidu.com/s/1rCxwday9E0TDxXZ6KekcjQ 提取码:43y2  （失效的话可以私信我，不过可能回复不及时。）

images 下载前三个到一个文件夹下，解压data_object_image_2.zip即可，同时下载标签文件 data_object_label_2.zip

1.2制作文件存放的文件夹

a.建立VOC_KITTI文件夹下，在VOC_KITTI 下建立Annotations,JPEGImages,labels 文件夹，

Annotations 用于存放voc xml标签文件

JPEGImages用于存放下载的图片 data_object_images_2

labels用于存放标签文件，存放data_object_labels_2 (7481)

1.3 修改kitti标签，改为自己关心需要的分类。

kitti数据集标签为8个分类：’Car’, ’Van’, ’Truck’, ’Pedestrian’, ’Person (sit- ting)’, ’Cyclist’, ’Tram’ 和’Misc’
以下提供一个脚本modify_annotations_txt.py来将原来的8类物体转换为我们现在需要的3类：Car，Pedestrain，Cyclist。我们把原来的Car、Van、Truck，Tram合并为Car类，把原来的Pedestrain，Person(sit-ting)合并为现在的Pedestrain，原来的Cyclist这一类保持不变。

建立modify_annotations_txt.py

#要修改第5行，txt_list 为自己的kitti标签地址。

# modify_annotations_txt.py
import glob
import string

txt_list = glob.glob('./labels/data_object_label_2/training/label_2/*.txt') 
# 存储Labels文件夹所有txt文件路径

def show_category(txt_list):
    category_list= []
    for item in txt_list:
        try:
            with open(item) as tdf:
                for each_line in tdf:
                    labeldata = each_line.strip().split(' ') # 去掉前后多余的字符并把其分开
                    category_list.append(labeldata[0]) # 只要第一个字段，即类别
        except IOError as ioerr:
            print('File error:'+str(ioerr))
    print(set(category_list)) # 输出集合
def merge(line):
    each_line=''
    for i in range(len(line)):
        if i!= (len(line)-1):
            each_line=each_line+line[i]+' '
        else:
            each_line=each_line+line[i] # 最后一条字段后面不加空格
    each_line=each_line+'\n'
    return (each_line)
print('before modify categories are:\n')
show_category(txt_list)
for item in txt_list:
    new_txt=[]
    try:
        with open(item, 'r') as r_tdf:
            for each_line in r_tdf:
                labeldata = each_line.strip().split(' ')
                if labeldata[0] in ['Truck','Van','Tram']: # 合并汽车类
                    labeldata[0] = labeldata[0].replace(labeldata[0],'Car')
                if labeldata[0] == 'Person_sitting': # 合并行人类
                    labeldata[0] = labeldata[0].replace(labeldata[0],'Pedestrian')
                if labeldata[0] == 'DontCare': # 忽略Dontcare类
                    continue
                if labeldata[0] == 'Misc': # 忽略Misc类
                    continue
                new_txt.append(merge(labeldata)) # 重新写入新的txt文件

        with open(item,'w+') as w_tdf: # w+是打开原文件将内容删除，另写新内容进去
            for temp in new_txt:
                w_tdf.write(temp)
    except IOError as ioerr:
        print('File error:'+str(ioerr))
print('\nafter modify categories are:\n')
show_category(txt_list)

'''
#知识点
1.glob()
import glob
glob.glob('./.../*.txt')  #读取此路径下的所有.txt文件
2.for循环，按文件读取，for循环，按行读取，每行分割成只要每行第一个关键字，if 语句，字典的使用，continue！，
3.merge函数，把零散化后的每行，重新整合成一行。for循环，循环的是每个元素，字符串，直接‘+’号
4.写入新的label文件，w+，for循环，每行，  .write()函数
5.输出总类别的函数，show_category()
取所有文件的每行第一个关键字，append，set()!
'''

以000010.txt为例

#000010.txt  原件
Car 0.80 0 -2.09 1013.39 182.46 1241.00 374.00 1.57 1.65 3.35 4.43 1.65 5.20 -1.42
Car 0.00 0 1.95 354.43 185.52 549.52 294.49 1.43 1.70 3.95 -2.39 1.66 11.80 1.76
Pedestrian 0.00 2 1.41 859.54 159.80 879.68 221.40 1.96 0.72 1.09 8.33 1.55 23.51 1.75
Car 0.00 0 -1.78 819.63 178.12 926.85 251.56 1.51 1.60 3.24 5.85 1.64 16.50 -1.44
Car 0.00 2 -1.69 800.54 178.06 878.75 230.56 1.45 1.74 4.10 6.87 1.62 22.05 -1.39
Car 0.00 0 1.80 558.55 179.04 635.05 230.61 1.54 1.68 3.79 -0.38 1.76 23.64 1.78
Car 0.00 2 1.77 598.30 178.68 652.25 218.17 1.49 1.52 3.35 0.64 1.74 29.07 1.79
Car 0.00 1 -1.67 784.59 178.04 839.98 220.10 1.53 1.65 4.37 7.88 1.75 28.53 -1.40
Car 0.00 1 1.92 663.74 175.36 707.21 204.15 1.64 1.45 3.48 4.50 1.80 42.85 2.02
DontCare -1 -1 -10 737.69 163.56 790.86 197.98 -1 -1 -1 -1000 -1000 -1000 -10
DontCare -1 -1 -10 135.60 185.44 196.06 202.15 -1 -1 -1 -1000 -1000 -1000 -10
DontCare -1 -1 -10 796.02 162.52 862.73 183.40 -1 -1 -1 -1000 -1000 -1000 -10
DontCare -1 -1 -10 879.35 165.65 931.48 182.36 -1 -1 -1 -1000 -1000 -1000 -10
#000010.txt  modified
Car 0.80 0 -2.09 1013.39 182.46 1241.00 374.00 1.57 1.65 3.35 4.43 1.65 5.20 -1.42
Car 0.00 0 1.95 354.43 185.52 549.52 294.49 1.43 1.70 3.95 -2.39 1.66 11.80 1.76
Pedestrian 0.00 2 1.41 859.54 159.80 879.68 221.40 1.96 0.72 1.09 8.33 1.55 23.51 1.75
Car 0.00 0 -1.78 819.63 178.12 926.85 251.56 1.51 1.60 3.24 5.85 1.64 16.50 -1.44
Car 0.00 2 -1.69 800.54 178.06 878.75 230.56 1.45 1.74 4.10 6.87 1.62 22.05 -1.39
Car 0.00 0 1.80 558.55 179.04 635.05 230.61 1.54 1.68 3.79 -0.38 1.76 23.64 1.78
Car 0.00 2 1.77 598.30 178.68 652.25 218.17 1.49 1.52 3.35 0.64 1.74 29.07 1.79
Car 0.00 1 -1.67 784.59 178.04 839.98 220.10 1.53 1.65 4.37 7.88 1.75 28.53 -1.40
Car 0.00 1 1.92 663.74 175.36 707.21 204.15 1.64 1.45 3.48 4.50 1.80 42.85 2.02

对应的含义：标签中存在15个数，解释里给了16个？？？emmm

1.4 kitti(.txt) --> voc(.xml),建立kitti_txt_to_xml.py

# kitti_txt_to_xml.py
# encoding:utf-8
# 根据一个给定的XML Schema，使用DOM树的形式从空白文件生成一个XML
# 把 .txt -> voc .xml
from xml.dom.minidom import Document
import cv2
import os
def generate_xml(name,split_lines,img_size,class_ind):
    doc = Document() # 创建DOM文档对象
    annotation = doc.createElement('annotation')
    doc.appendChild(annotation)
    title = doc.createElement('folder')
    title_text = doc.createTextNode('KITTI')
    title.appendChild(title_text)
    annotation.appendChild(title)
    img_name=name+'.png'
    title = doc.createElement('filename')
    title_text = doc.createTextNode(img_name)
    title.appendChild(title_text)
    annotation.appendChild(title)
    source = doc.createElement('source')
    annotation.appendChild(source)
    title = doc.createElement('database')
    title_text = doc.createTextNode('The KITTI Database')
    title.appendChild(title_text)
    source.appendChild(title)
    title = doc.createElement('annotation')
    title_text = doc.createTextNode('KITTI')
    title.appendChild(title_text)
    source.appendChild(title)
    size = doc.createElement('size')
    annotation.appendChild(size)
    title = doc.createElement('width')
    title_text = doc.createTextNode(str(img_size[1]))
    title.appendChild(title_text)
    size.appendChild(title)
    title = doc.createElement('height')
    title_text = doc.createTextNode(str(img_size[0]))
    title.appendChild(title_text)
    size.appendChild(title)
    title = doc.createElement('depth')
    title_text = doc.createTextNode(str(img_size[2]))
    title.appendChild(title_text)
    size.appendChild(title)
    for split_line in split_lines:
        line=split_line.strip().split()
        if line[0] in class_ind:
            object = doc.createElement('object')
            annotation.appendChild(object)
            title = doc.createElement('name')
            title_text = doc.createTextNode(line[0])
            title.appendChild(title_text)
            object.appendChild(title)
            bndbox = doc.createElement('bndbox')
            object.appendChild(bndbox)
            title = doc.createElement('xmin')
            title_text = doc.createTextNode(str(int(float(line[4]))))
            title.appendChild(title_text)
            bndbox.appendChild(title)
            title = doc.createElement('ymin')
            title_text = doc.createTextNode(str(int(float(line[5]))))
            title.appendChild(title_text)
            bndbox.appendChild(title)
            title = doc.createElement('xmax')
            title_text = doc.createTextNode(str(int(float(line[6]))))
            title.appendChild(title_text)
            bndbox.appendChild(title)
            title = doc.createElement('ymax')
            title_text = doc.createTextNode(str(int(float(line[7]))))
            title.appendChild(title_text)
            bndbox.appendChild(title)
    # 将DOM对象doc写入文件
    f = open('Annotations/'+name+'.xml','w')
    f.write(doc.toprettyxml(indent = ''))
    f.close()
if __name__ == '__main__':
    class_ind=('Pedestrian', 'Car', 'Cyclist')
    cur_dir=os.getcwd()
    #/home/studieren/PycharmProjects/darknet-master/VOC_KITTI
    labels_dir=os.path.join(cur_dir,'labels/data_object_label_2/training/label_2')
    #/home/studieren/PycharmProjects/keras-yolo3-master/VOC_KITTI/labels/data_object_label_2/training/label_2
    for parent, dirnames, filenames in os.walk(labels_dir): # 分别得到根目录，子目录和根目录下文件
        for file_name in filenames:
            full_path=os.path.join(parent, file_name) # 获取文件全路径
            f=open(full_path)
            split_lines = f.readlines()
            name= file_name[:-4] # 后四位是扩展名.txt，只取前面的文件名
            img_name=name+'.png'
            img_path=os.path.join('./JPEGImages/data_object_image_2/training/image_2/',img_name) # 路径需要自行修改  /home/studieren/PycharmProjects/keras-yolo3-master/VOC_KITTI/JPEGImages/data_object_image_2/training/image_2
            img_size=cv2.imread(img_path).shape
            generate_xml(name,split_lines,img_size,class_ind)
print('all txts has converted into xmls')

#   73   xml保存路径                       输出
#   80   .txt标签路径                      输入
#   89   图像路径,用于读取图像shape          输入

# 000010.txt_modified
Car 0.80 0 -2.09 1013.39 182.46 1241.00 374.00 1.57 1.65 3.35 4.43 1.65 5.20 -1.42
Car 0.00 0 1.95 354.43 185.52 549.52 294.49 1.43 1.70 3.95 -2.39 1.66 11.80 1.76
Pedestrian 0.00 2 1.41 859.54 159.80 879.68 221.40 1.96 0.72 1.09 8.33 1.55 23.51 1.75
Car 0.00 0 -1.78 819.63 178.12 926.85 251.56 1.51 1.60 3.24 5.85 1.64 16.50 -1.44
Car 0.00 2 -1.69 800.54 178.06 878.75 230.56 1.45 1.74 4.10 6.87 1.62 22.05 -1.39
Car 0.00 0 1.80 558.55 179.04 635.05 230.61 1.54 1.68 3.79 -0.38 1.76 23.64 1.78
Car 0.00 2 1.77 598.30 178.68 652.25 218.17 1.49 1.52 3.35 0.64 1.74 29.07 1.79
Car 0.00 1 -1.67 784.59 178.04 839.98 220.10 1.53 1.65 4.37 7.88 1.75 28.53 -1.40
Car 0.00 1 1.92 663.74 175.36 707.21 204.15 1.64 1.45 3.48 4.50 1.80 42.85 2.02

# 000010.xml
<?xml version="1.0" ?>
<annotation>
    <folder>KITTI</folder>
    <filename>000010.png</filename>
    <source>
        <database>The KITTI Database</database>
        <annotation>KITTI</annotation>
    </source>
    <size>
        <width>1242</width>
        <height>375</height>
        <depth>3</depth>
    </size>
    <object>
        <name>Car</name>
        <bndbox>
            <xmin>1013</xmin>
            <ymin>182</ymin>
            <xmax>1241</xmax>
            <ymax>374</ymax>
        </bndbox>
    </object>
    <object>
        <name>Car</name>
        <bndbox>
            <xmin>354</xmin>
            <ymin>185</ymin>
            <xmax>549</xmax>
            <ymax>294</ymax>
        </bndbox>
    </object>
    <object>
        <name>Pedestrian</name>
        <bndbox>
            <xmin>859</xmin>
            <ymin>159</ymin>
            <xmax>879</xmax>
            <ymax>221</ymax>
        </bndbox>
    </object>
    <object>
        <name>Car</name>
        <bndbox>
            <xmin>819</xmin>
            <ymin>178</ymin>
            <xmax>926</xmax>
            <ymax>251</ymax>
        </bndbox>
    </object>
    <object>
        <name>Car</name>
        <bndbox>
            <xmin>800</xmin>
            <ymin>178</ymin>
            <xmax>878</xmax>
            <ymax>230</ymax>
        </bndbox>
    </object>
    <object>
        <name>Car</name>
        <bndbox>
            <xmin>558</xmin>
            <ymin>179</ymin>
            <xmax>635</xmax>
            <ymax>230</ymax>
        </bndbox>
    </object>
    <object>
        <name>Car</name>
        <bndbox>
            <xmin>598</xmin>
            <ymin>178</ymin>
            <xmax>652</xmax>
            <ymax>218</ymax>
        </bndbox>
    </object>
    <object>
        <name>Car</name>
        <bndbox>
            <xmin>784</xmin>
            <ymin>178</ymin>
            <xmax>839</xmax>
            <ymax>220</ymax>
        </bndbox>
    </object>
    <object>
        <name>Car</name>
        <bndbox>
            <xmin>663</xmin>
            <ymin>175</ymin>
            <xmax>707</xmax>
            <ymax>204</ymax>
        </bndbox>
    </object>
</annotation>

1.5 voc(.xml) --> darknet .txt xml_to_yolo_txt.py

左上右下坐标转换成xywh，且归一化，label从文字变成 0,1,2,

# xml_to_yolo_txt.py
# 此代码和VOC_KITTI文件夹同目录
import glob
import xml.etree.ElementTree as ET
# 这里的类名为我们xml里面的类名，顺序现在不需要考虑
class_names = ['Car', 'Cyclist', 'Pedestrian']
# xml文件路径
path = '/home/studieren/PycharmProjects/keras-yolo3-master/VOC_KITTI/Annotations/'
# 转换一个xml文件为txt
def single_xml_to_txt(xml_file):
    tree = ET.parse(xml_file)
    root = tree.getroot()
    # 保存的txt文件路径
    txt_file = xml_file.split('.')[0]+'.txt'
    with open(txt_file, 'w') as txt_file:
        for member in root.findall('object'):
            #filename = root.find('filename').text
            picture_width = int(root.find('size')[0].text)
            picture_height = int(root.find('size')[1].text)
            class_name = member[0].text
            # 类名对应的index
            class_num = class_names.index(class_name)

            box_x_min = int(member[1][0].text) # 左上角横坐标
            box_y_min = int(member[1][1].text) # 左上角纵坐标
            box_x_max = int(member[1][2].text) # 右下角横坐标
            box_y_max = int(member[1][3].text) # 右下角纵坐标
            # 转成相对位置和宽高
            x_center = float(box_x_min + box_x_max) / (2 * picture_width)
            y_center = float(box_y_min + box_y_max) / (2 * picture_height)
            width = float(box_x_max - box_x_min) /  picture_width
            height = float(box_y_max - box_y_min) /  picture_height
            #print(class_num, x_center, y_center, width, height)
            txt_file.write(str(class_num) + ' ' + str(x_center) + ' ' + str(y_center) + ' ' + str(width) + ' ' + str(height) + '\n')
# 转换文件夹下的所有xml文件为txt
def dir_xml_to_txt(path):
    for xml_file in glob.glob(path + '*.xml'):
        single_xml_to_txt(xml_file)
dir_xml_to_txt(path)

'''
需要修改的仅是第8行的路径，修改为自己的Annotations
输入Annotations   1.类别信息---> 0,1,2     2.左上右下坐标--> x y w h
输出 .txt (标签 x y w h) 归一化的
'''

000010.xml 如前

000010.txt 
0 0.9074074074074074 0.7413333333333333 0.18357487922705315 0.512
0 0.3635265700483092 0.6386666666666667 0.1570048309178744 0.2906666666666667
2 0.6996779388083736 0.5066666666666667 0.01610305958132045 0.16533333333333333
0 0.7024959742351047 0.572 0.0861513687600644 0.19466666666666665
0 0.6755233494363929 0.544 0.06280193236714976 0.13866666666666666
0 0.48027375201288247 0.5453333333333333 0.061996779388083734 0.136
0 0.5032206119162641 0.528 0.043478260869565216 0.10666666666666667
0 0.6533816425120773 0.5306666666666666 0.04428341384863124 0.112
0 0.5515297906602254 0.5053333333333333 0.03542673107890499 0.07733333333333334

1.6 整合制作好的标签和图像在kitti_data路径下

a.将kitti_data文件夹建立在darknet文件目录下

b.在kitti_data 下建立四个文件夹，train_images, train_labels, val_images, val_labels

把./JPEGImages/data_object_image_2/training/image_2/下的images直接放在train_images文件夹下，把Annotations里的文件（只需.txt，但.xml和.txt区分不开，所以都拷贝过来即可）直接放在train_labels文件夹下。val_images和val_lables暂不考虑。

1.7制作train.txt 和 val.txt（因为暂不考虑，所以，只需空文件即可）

kitti_train_val.py

# kitti_train_val.py
# 此代码和kitti_data文件夹同目录
import glob
path = 'kitti_data/'
def generate_train_and_val(image_path, txt_file):
    with open(txt_file, 'w') as tf:
        for jpg_file in glob.glob(image_path + '*.png'):
            tf.write(jpg_file + '\n')
generate_train_and_val(path + 'train_images/', path + 'train.txt') # 生成的train.txt文件所在路径
# generate_train_and_val(path + 'val_images/', path + 'val.txt') # 生成的val.txt文件所在路径
# 生成训练集图像的路径

2.修改配置文件

2.1修改kitti.names

Car
Pedestrian
Cyclist

2.2修改kitti.data 种类3，对应kitti.names ; train 对应train.txt的路径；val同理； backup 训练过程以及完成后模型保存的路径。

classes= 3
train = kitti_data/train.txt
valid = kitti_data/val.txt
names = kitti_data/kitti.names
backup = backup/

2.3修改yolo-kitti.cfg 保存在cfg文件夹下。

以yolo.cfg为基础，修改。从文件末尾往上捯，以[yolo]为标志找。

分别修改3处的 classes=80 --> 3 ,filters=255 --> 24 (5+80)*3=255 (5+3)*3=24 xywhscore

3.训练

3.1下载Imagenet预训练模型

存放路径是darkent安装路径

yolov3默认的训练权重为darknet53，我们可以在darknet路径下打开终端，输入命令下载权重：

wget https://pjreddie.com/media/files/darknet53.conv.74

下载起来十分慢，在此提供百度网盘地址。（错误的权重）不知道自己从哪下的，但是这个权重是错的！！！

链接:https://pan.baidu.com/s/1rnOZNMbl_wSahx3IX0eqmg 提取码:zn6j

正确的权重（已测试成功！）

https://pan.baidu.com/s/1t9pypomR6qRptfaOOjDhpg 提取码: nvnh

3.2 输入训练命令

darknet-master文件夹下

./darknet detector train kitti_data/kitti.data cfg/yolov3-kitti.cfg darknet53.conv.74 -gpu 0,1,2,3

我这里只有一个gpu，所以只

./darknet detector train kitti_data/kitti.data cfg/yolov3-kitti.cfg darknet53.conv.74 -gpu 0

3.3 训练过程

我大概是0.2min一轮训练。

（由于加载错误的权重文件，导致全部是nan）

yolov3-kitti
layer     filters    size              input                output
    0 conv     32  3 x 3 / 1   416 x 416 x   3   ->   416 x 416 x  32  0.299 BFLOPs
    1 conv     64  3 x 3 / 2   416 x 416 x  32   ->   208 x 208 x  64  1.595 BFLOPs
    2 conv     32  1 x 1 / 1   208 x 208 x  64   ->   208 x 208 x  32  0.177 BFLOPs
    3 conv     64  3 x 3 / 1   208 x 208 x  32   ->   208 x 208 x  64  1.595 BFLOPs
    4 res    1                 208 x 208 x  64   ->   208 x 208 x  64
    5 conv    128  3 x 3 / 2   208 x 208 x  64   ->   104 x 104 x 128  1.595 BFLOPs
    6 conv     64  1 x 1 / 1   104 x 104 x 128   ->   104 x 104 x  64  0.177 BFLOPs
    7 conv    128  3 x 3 / 1   104 x 104 x  64   ->   104 x 104 x 128  1.595 BFLOPs
    8 res    5                 104 x 104 x 128   ->   104 x 104 x 128
    9 conv     64  1 x 1 / 1   104 x 104 x 128   ->   104 x 104 x  64  0.177 BFLOPs
   10 conv    128  3 x 3 / 1   104 x 104 x  64   ->   104 x 104 x 128  1.595 BFLOPs
   11 res    8                 104 x 104 x 128   ->   104 x 104 x 128
   12 conv    256  3 x 3 / 2   104 x 104 x 128   ->    52 x  52 x 256  1.595 BFLOPs
   13 conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128  0.177 BFLOPs
   14 conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256  1.595 BFLOPs
   15 res   12                  52 x  52 x 256   ->    52 x  52 x 256
   16 conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128  0.177 BFLOPs
   17 conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256  1.595 BFLOPs
   18 res   15                  52 x  52 x 256   ->    52 x  52 x 256
   19 conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128  0.177 BFLOPs
   20 conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256  1.595 BFLOPs
   21 res   18                  52 x  52 x 256   ->    52 x  52 x 256
   22 conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128  0.177 BFLOPs
   23 conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256  1.595 BFLOPs
   24 res   21                  52 x  52 x 256   ->    52 x  52 x 256
   25 conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128  0.177 BFLOPs
   26 conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256  1.595 BFLOPs
   27 res   24                  52 x  52 x 256   ->    52 x  52 x 256
   28 conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128  0.177 BFLOPs
   29 conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256  1.595 BFLOPs
   30 res   27                  52 x  52 x 256   ->    52 x  52 x 256
   31 conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128  0.177 BFLOPs
   32 conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256  1.595 BFLOPs
   33 res   30                  52 x  52 x 256   ->    52 x  52 x 256
   34 conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128  0.177 BFLOPs
   35 conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256  1.595 BFLOPs
   36 res   33                  52 x  52 x 256   ->    52 x  52 x 256
   37 conv    512  3 x 3 / 2    52 x  52 x 256   ->    26 x  26 x 512  1.595 BFLOPs
   38 conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256  0.177 BFLOPs
   39 conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512  1.595 BFLOPs
   40 res   37                  26 x  26 x 512   ->    26 x  26 x 512
   41 conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256  0.177 BFLOPs
   42 conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512  1.595 BFLOPs
   43 res   40                  26 x  26 x 512   ->    26 x  26 x 512
   44 conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256  0.177 BFLOPs
   45 conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512  1.595 BFLOPs
   46 res   43                  26 x  26 x 512   ->    26 x  26 x 512
   47 conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256  0.177 BFLOPs
   48 conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512  1.595 BFLOPs
   49 res   46                  26 x  26 x 512   ->    26 x  26 x 512
   50 conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256  0.177 BFLOPs
   51 conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512  1.595 BFLOPs
   52 res   49                  26 x  26 x 512   ->    26 x  26 x 512
   53 conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256  0.177 BFLOPs
   54 conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512  1.595 BFLOPs
   55 res   52                  26 x  26 x 512   ->    26 x  26 x 512
   56 conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256  0.177 BFLOPs
   57 conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512  1.595 BFLOPs
   58 res   55                  26 x  26 x 512   ->    26 x  26 x 512
   59 conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256  0.177 BFLOPs
   60 conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512  1.595 BFLOPs
   61 res   58                  26 x  26 x 512   ->    26 x  26 x 512
   62 conv   1024  3 x 3 / 2    26 x  26 x 512   ->    13 x  13 x1024  1.595 BFLOPs
   63 conv    512  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 512  0.177 BFLOPs
   64 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024  1.595 BFLOPs
   65 res   62                  13 x  13 x1024   ->    13 x  13 x1024
   66 conv    512  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 512  0.177 BFLOPs
   67 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024  1.595 BFLOPs
   68 res   65                  13 x  13 x1024   ->    13 x  13 x1024
   69 conv    512  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 512  0.177 BFLOPs
   70 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024  1.595 BFLOPs
   71 res   68                  13 x  13 x1024   ->    13 x  13 x1024
   72 conv    512  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 512  0.177 BFLOPs
   73 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024  1.595 BFLOPs
   74 res   71                  13 x  13 x1024   ->    13 x  13 x1024
   75 conv    512  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 512  0.177 BFLOPs
   76 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024  1.595 BFLOPs
   77 conv    512  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 512  0.177 BFLOPs
   78 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024  1.595 BFLOPs
   79 conv    512  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 512  0.177 BFLOPs
   80 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024  1.595 BFLOPs
   81 conv     24  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x  24  0.008 BFLOPs
   82 yolo
   83 route  79
   84 conv    256  1 x 1 / 1    13 x  13 x 512   ->    13 x  13 x 256  0.044 BFLOPs
   85 upsample            2x    13 x  13 x 256   ->    26 x  26 x 256
   86 route  85 61
   87 conv    256  1 x 1 / 1    26 x  26 x 768   ->    26 x  26 x 256  0.266 BFLOPs
   88 conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512  1.595 BFLOPs
   89 conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256  0.177 BFLOPs
   90 conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512  1.595 BFLOPs
   91 conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256  0.177 BFLOPs
   92 conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512  1.595 BFLOPs
   93 conv     24  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x  24  0.017 BFLOPs
   94 yolo
   95 route  91
   96 conv    128  1 x 1 / 1    26 x  26 x 256   ->    26 x  26 x 128  0.044 BFLOPs
   97 upsample            2x    26 x  26 x 128   ->    52 x  52 x 128
   98 route  97 36
   99 conv    128  1 x 1 / 1    52 x  52 x 384   ->    52 x  52 x 128  0.266 BFLOPs
  100 conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256  1.595 BFLOPs
  101 conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128  0.177 BFLOPs
  102 conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256  1.595 BFLOPs
  103 conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128  0.177 BFLOPs
  104 conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256  1.595 BFLOPs
  105 conv     24  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x  24  0.033 BFLOPs
  106 yolo
Loading weights from darknet53.conv.74...Done!
Learning Rate: 0.001, Momentum: 0.9, Decay: 0.0005
Resizing
576
Loaded: 0.000027 seconds
Region 82 Avg IOU: 0.000000, Class: 0.000000, Obj: 0.000000, No Obj: 0.000000, .5R: 0.000000, .75R: 0.000000,  count: 2
Region 94 Avg IOU: 0.000000, Class: 0.000000, Obj: 0.000000, No Obj: 0.000000, .5R: 0.000000, .75R: 0.000000,  count: 1
Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000000, .5R: -nan, .75R: -nan,  count: 0
Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000000, .5R: -nan, .75R: -nan,  count: 0
Region 94 Avg IOU: 0.000000, Class: 0.000000, Obj: 0.000000, No Obj: 0.000000, .5R: 0.000000, .75R: 0.000000,  count: 1
Region 106 Avg IOU: 0.000000, Class: 0.000000, Obj: 0.000000, No Obj: 0.000000, .5R: 0.000000, .75R: 0.000000,  count: 1
Region 82 Avg IOU: 0.000000, Class: 0.000000, Obj: 0.000000, No Obj: 0.000000, .5R: 0.000000, .75R: 0.000000,  count: 1
Region 94 Avg IOU: 0.000000, Class: 0.000000, Obj: 0.000000, No Obj: 0.000000, .5R: 0.000000, .75R: 0.000000,  count: 1
Region 106 Avg IOU: 0.000000, Class: 0.000000, Obj: 0.000000, No Obj: 0.000000, .5R: 0.000000, .75R: 0.000000,  count: 6
Region 82 Avg IOU: 0.000000, Class: 0.000000, Obj: 0.000000, No Obj: 0.000000, .5R: 0.000000, .75R: 0.000000,  count: 1
Region 94 Avg IOU: 0.000000, Class: 0.000000, Obj: 0.000000, No Obj: 0.000000, .5R: 0.000000, .75R: 0.000000,  count: 1
...

加载正确后的正常训练

4.训练结果

保存在backup文件夹下，我电脑不太行，只迭代了200步。设置了每100步，保存一次。

5.测试

要修改cfg.文件把train的batch 和 sub 注释掉，把test 的 batch 和 sub 解注释掉。

./darknet detector test kitti_data/kitti.data cfg/yolov3-kitti.cfg backup/yolov3-kitti.backup data/000005.png

6.中断训练后继续训练

只不过是把权重文件变成自己训练多少步以后的权重了，其他没变

./darknet detector train kitti_data/kitti.data cfg/yolov3-kitti.cfg backup/yolov3-kitti.backup -gpus 0,1,2,3

7.遇到的问题

7.1 conda 安装opencv的问题，见博客https://blog.csdn.net/qq_40297851/article/details/104902363

7.2 0 Cuda malloc failed

(pointpillars) studieren@studieren-GS65-Stealth-Thin-8RE:~/PycharmProjects/darknet-master$ ./darknet detector train kitti_data/kitti.data cfg/yolov3-kitti.cfg darknet53.conv.74 -gpu 0
yolov3-kitti
layer filters size input output
0 Cuda malloc failed
: File exists
darknet: ./src/utils.c:256: error: Assertion `0' failed.
已放弃 (核心已转储)

查了好多都没有，通过阅读别人的训练过程，更改了两个地方

a.改小了 yolo-kitti.cfg 中的 width height b.改了 batch ，subdivisions=64.(这个主要！！！)

batch=64 每batch个样本更新一次参数。

subdivisions=16 如果内存不够大，将batch分割为subdivisions个子batch，每个子batch的大小为batch/subdivisions。

训练能力不好，就调小batch，调高subdivisions。但是subdivisions<=batch

8.参考资料

训练过程参考

https://blog.csdn.net/qq583083658/article/details/86321987 主要参考！！！

subdivisions参考

https://blog.csdn.net/chenyuping333/article/details/84373963?depth_1-utm_source=distribute.pc_relevant.none-task&utm_source=distribute.pc_relevant.none-task

416,416参考

https://blog.csdn.net/kevineeo/article/details/84572589

感谢！！！

9.补充更改模型保存间隔

在detector.c里面，没注意这个模型保存间隔，训练起来以后，还是很麻烦，因为>900 次以后，每10000次才保存一次，真是巨坑。在138行。可以改成500次保存一下。

10.记录一下他的运行时间 titanx 一个gpu

刚开始运行的时候，都是3.多秒，后来咋就变5s/6s+了，那我啥时候能训练完成啊。我的电脑是20多秒。

这会由3.多秒了

查看显存使用率

watch -n 10 nvidia-smi

Fan：风扇转速（0%–100%），N/A表示没有风扇
Temp： GPU温度（GPU温度过高会导致GPU频率下降）
Perf：性能状态，从P0（最大性能）到P12（最小性能）
Pwr： GPU功耗
Persistence-M：持续模式的状态（持续模式耗能大，但在新的GPU应用启动时花费时间更少）
Bus-Id： GPU总线，domain?device.function
Disp.A： Display Active，表示GPU的显示是否初始化
Memory-Usage：显存使用率
Volatile GPU-Util：GPU使用率
ECC：是否开启错误检查和纠正技术，0/DISABLED, 1/ENABLED
Compute M.：计算模式，0/DEFAULT,1/EXCLUSIVE_PROCESS,2/PROHIBITED

可以看出利用效率不怎么高，本次是64,16，下次，64，4.试试。

darknet yolov3 训练 kitti数据集

1.kitti-->voc-->darknet 数据集制作

1.1kitti数据集下载

1.2制作文件存放的文件夹

a.建立VOC_KITTI文件夹下，在VOC_KITTI 下建立Annotations,JPEGImages,labels 文件夹，

1.3 修改kitti标签，改为自己关心需要的分类。

1.4 kitti(.txt) --> voc(.xml),建立kitti_txt_to_xml.py

1.5 voc(.xml) --> darknet .txt xml_to_yolo_txt.py

1.6 整合制作好的标签和图像在kitti_data路径下

1.7制作train.txt 和 val.txt（因为暂不考虑，所以，只需空文件即可）

2.修改配置文件

2.1修改kitti.names

2.2修改kitti.data 种类3，对应kitti.names ; train 对应train.txt的路径；val同理； backup 训练过程以及完成后模型保存的路径。

2.3修改yolo-kitti.cfg 保存在cfg文件夹下。

3.训练

3.1下载Imagenet预训练模型

正确的权重（已测试成功！）

3.2 输入训练命令

./darknet detector train kitti_data/kitti.data cfg/yolov3-kitti.cfg darknet53.conv.74 -gpu 0

3.3 训练过程

4.训练结果

5.测试

6.中断训练后继续训练

7.遇到的问题

8.参考资料

9.补充更改模型保存间隔

10.记录一下他的运行时间 titanx 一个gpu

查看显存使用率

猜你喜欢

darknet yolov3 训练 kitti数据集

1.kitti-->voc-->darknet 数据集制作

1.1kitti数据集下载

1.2制作文件存放的文件夹

a.建立VOC_KITTI文件夹下，在VOC_KITTI 下建立Annotations,JPEGImages,labels 文件夹，

1.3 修改kitti标签，改为自己关心需要的分类。

1.4 kitti(.txt) --> voc(.xml),建立kitti_txt_to_xml.py

1.5 voc(.xml) --> darknet .txt xml_to_yolo_txt.py

1.6 整合制作好的标签和图像在kitti_data路径下

1.7制作train.txt 和 val.txt（因为暂不考虑，所以，只需空文件即可）

2.修改配置文件

2.1修改kitti.names

2.2修改kitti.data 种类3，对应kitti.names ; train 对应train.txt的路径 ；val同理； backup 训练过程以及完成后模型保存的路径。

2.3修改yolo-kitti.cfg 保存在cfg文件夹下。

3.训练

3.1下载Imagenet预训练模型

正确的权重（已测试成功！）

3.2 输入训练命令

./darknet detector train kitti_data/kitti.data cfg/yolov3-kitti.cfg darknet53.conv.74 -gpu 0

3.3 训练过程

4.训练结果

5.测试

6.中断训练后继续训练

7.遇到的问题

8.参考资料

9.补充更改模型保存间隔

10.记录一下他的运行时间 titanx 一个gpu

查看显存使用率

猜你喜欢

2.2修改kitti.data 种类3，对应kitti.names ; train 对应train.txt的路径；val同理； backup 训练过程以及完成后模型保存的路径。