1.kitti-->voc-->darknet 数据集制作
1.1kitti数据集下载
链接:https://pan.baidu.com/s/1rCxwday9E0TDxXZ6KekcjQ 提取码:43y2 (失效的话可以私信我,不过可能回复不及时。)
images 下载前三个到一个文件夹下,解压data_object_image_2.zip即可,同时下载标签文件 data_object_label_2.zip
1.2制作文件存放的文件夹
a.建立VOC_KITTI文件夹下,在VOC_KITTI 下建立Annotations,JPEGImages,labels 文件夹,
Annotations 用于存放voc xml标签文件
JPEGImages用于存放下载的图片 data_object_images_2
labels用于存放标签文件,存放data_object_labels_2 (7481)
1.3 修改kitti标签,改为自己关心需要的分类。
kitti数据集标签为8个分类:’Car’, ’Van’, ’Truck’, ’Pedestrian’, ’Person (sit- ting)’, ’Cyclist’, ’Tram’ 和’Misc’
以下提供一个脚本modify_annotations_txt.py来将原来的8类物体转换为我们现在需要的3类:Car,Pedestrain,Cyclist。我们把原来的Car、Van、Truck,Tram合并为Car类,把原来的Pedestrain,Person(sit-ting)合并为现在的Pedestrain,原来的Cyclist这一类保持不变。
建立modify_annotations_txt.py
#要修改第5行,txt_list 为自己的kitti标签地址。
# modify_annotations_txt.py
import glob
import string
txt_list = glob.glob('./labels/data_object_label_2/training/label_2/*.txt')
# 存储Labels文件夹所有txt文件路径
def show_category(txt_list):
category_list= []
for item in txt_list:
try:
with open(item) as tdf:
for each_line in tdf:
labeldata = each_line.strip().split(' ') # 去掉前后多余的字符并把其分开
category_list.append(labeldata[0]) # 只要第一个字段,即类别
except IOError as ioerr:
print('File error:'+str(ioerr))
print(set(category_list)) # 输出集合
def merge(line):
each_line=''
for i in range(len(line)):
if i!= (len(line)-1):
each_line=each_line+line[i]+' '
else:
each_line=each_line+line[i] # 最后一条字段后面不加空格
each_line=each_line+'\n'
return (each_line)
print('before modify categories are:\n')
show_category(txt_list)
for item in txt_list:
new_txt=[]
try:
with open(item, 'r') as r_tdf:
for each_line in r_tdf:
labeldata = each_line.strip().split(' ')
if labeldata[0] in ['Truck','Van','Tram']: # 合并汽车类
labeldata[0] = labeldata[0].replace(labeldata[0],'Car')
if labeldata[0] == 'Person_sitting': # 合并行人类
labeldata[0] = labeldata[0].replace(labeldata[0],'Pedestrian')
if labeldata[0] == 'DontCare': # 忽略Dontcare类
continue
if labeldata[0] == 'Misc': # 忽略Misc类
continue
new_txt.append(merge(labeldata)) # 重新写入新的txt文件
with open(item,'w+') as w_tdf: # w+是打开原文件将内容删除,另写新内容进去
for temp in new_txt:
w_tdf.write(temp)
except IOError as ioerr:
print('File error:'+str(ioerr))
print('\nafter modify categories are:\n')
show_category(txt_list)
'''
#知识点
1.glob()
import glob
glob.glob('./.../*.txt') #读取此路径下的所有.txt文件
2.for循环,按文件读取,for循环,按行读取,每行分割成只要每行第一个关键字,if 语句,字典的使用,continue!,
3.merge函数,把零散化后的每行,重新整合成一行。for循环,循环的是每个元素,字符串,直接‘+’号
4.写入新的label文件,w+,for循环,每行, .write()函数
5.输出总类别的函数,show_category()
取所有文件的每行第一个关键字,append,set()!
'''
以000010.txt为例
#000010.txt 原件
Car 0.80 0 -2.09 1013.39 182.46 1241.00 374.00 1.57 1.65 3.35 4.43 1.65 5.20 -1.42
Car 0.00 0 1.95 354.43 185.52 549.52 294.49 1.43 1.70 3.95 -2.39 1.66 11.80 1.76
Pedestrian 0.00 2 1.41 859.54 159.80 879.68 221.40 1.96 0.72 1.09 8.33 1.55 23.51 1.75
Car 0.00 0 -1.78 819.63 178.12 926.85 251.56 1.51 1.60 3.24 5.85 1.64 16.50 -1.44
Car 0.00 2 -1.69 800.54 178.06 878.75 230.56 1.45 1.74 4.10 6.87 1.62 22.05 -1.39
Car 0.00 0 1.80 558.55 179.04 635.05 230.61 1.54 1.68 3.79 -0.38 1.76 23.64 1.78
Car 0.00 2 1.77 598.30 178.68 652.25 218.17 1.49 1.52 3.35 0.64 1.74 29.07 1.79
Car 0.00 1 -1.67 784.59 178.04 839.98 220.10 1.53 1.65 4.37 7.88 1.75 28.53 -1.40
Car 0.00 1 1.92 663.74 175.36 707.21 204.15 1.64 1.45 3.48 4.50 1.80 42.85 2.02
DontCare -1 -1 -10 737.69 163.56 790.86 197.98 -1 -1 -1 -1000 -1000 -1000 -10
DontCare -1 -1 -10 135.60 185.44 196.06 202.15 -1 -1 -1 -1000 -1000 -1000 -10
DontCare -1 -1 -10 796.02 162.52 862.73 183.40 -1 -1 -1 -1000 -1000 -1000 -10
DontCare -1 -1 -10 879.35 165.65 931.48 182.36 -1 -1 -1 -1000 -1000 -1000 -10
#000010.txt modified
Car 0.80 0 -2.09 1013.39 182.46 1241.00 374.00 1.57 1.65 3.35 4.43 1.65 5.20 -1.42
Car 0.00 0 1.95 354.43 185.52 549.52 294.49 1.43 1.70 3.95 -2.39 1.66 11.80 1.76
Pedestrian 0.00 2 1.41 859.54 159.80 879.68 221.40 1.96 0.72 1.09 8.33 1.55 23.51 1.75
Car 0.00 0 -1.78 819.63 178.12 926.85 251.56 1.51 1.60 3.24 5.85 1.64 16.50 -1.44
Car 0.00 2 -1.69 800.54 178.06 878.75 230.56 1.45 1.74 4.10 6.87 1.62 22.05 -1.39
Car 0.00 0 1.80 558.55 179.04 635.05 230.61 1.54 1.68 3.79 -0.38 1.76 23.64 1.78
Car 0.00 2 1.77 598.30 178.68 652.25 218.17 1.49 1.52 3.35 0.64 1.74 29.07 1.79
Car 0.00 1 -1.67 784.59 178.04 839.98 220.10 1.53 1.65 4.37 7.88 1.75 28.53 -1.40
Car 0.00 1 1.92 663.74 175.36 707.21 204.15 1.64 1.45 3.48 4.50 1.80 42.85 2.02
对应的含义:标签中存在15个数,解释里给了16个???emmm
1.4 kitti(.txt) --> voc(.xml),建立kitti_txt_to_xml.py
# kitti_txt_to_xml.py
# encoding:utf-8
# 根据一个给定的XML Schema,使用DOM树的形式从空白文件生成一个XML
# 把 .txt -> voc .xml
from xml.dom.minidom import Document
import cv2
import os
def generate_xml(name,split_lines,img_size,class_ind):
doc = Document() # 创建DOM文档对象
annotation = doc.createElement('annotation')
doc.appendChild(annotation)
title = doc.createElement('folder')
title_text = doc.createTextNode('KITTI')
title.appendChild(title_text)
annotation.appendChild(title)
img_name=name+'.png'
title = doc.createElement('filename')
title_text = doc.createTextNode(img_name)
title.appendChild(title_text)
annotation.appendChild(title)
source = doc.createElement('source')
annotation.appendChild(source)
title = doc.createElement('database')
title_text = doc.createTextNode('The KITTI Database')
title.appendChild(title_text)
source.appendChild(title)
title = doc.createElement('annotation')
title_text = doc.createTextNode('KITTI')
title.appendChild(title_text)
source.appendChild(title)
size = doc.createElement('size')
annotation.appendChild(size)
title = doc.createElement('width')
title_text = doc.createTextNode(str(img_size[1]))
title.appendChild(title_text)
size.appendChild(title)
title = doc.createElement('height')
title_text = doc.createTextNode(str(img_size[0]))
title.appendChild(title_text)
size.appendChild(title)
title = doc.createElement('depth')
title_text = doc.createTextNode(str(img_size[2]))
title.appendChild(title_text)
size.appendChild(title)
for split_line in split_lines:
line=split_line.strip().split()
if line[0] in class_ind:
object = doc.createElement('object')
annotation.appendChild(object)
title = doc.createElement('name')
title_text = doc.createTextNode(line[0])
title.appendChild(title_text)
object.appendChild(title)
bndbox = doc.createElement('bndbox')
object.appendChild(bndbox)
title = doc.createElement('xmin')
title_text = doc.createTextNode(str(int(float(line[4]))))
title.appendChild(title_text)
bndbox.appendChild(title)
title = doc.createElement('ymin')
title_text = doc.createTextNode(str(int(float(line[5]))))
title.appendChild(title_text)
bndbox.appendChild(title)
title = doc.createElement('xmax')
title_text = doc.createTextNode(str(int(float(line[6]))))
title.appendChild(title_text)
bndbox.appendChild(title)
title = doc.createElement('ymax')
title_text = doc.createTextNode(str(int(float(line[7]))))
title.appendChild(title_text)
bndbox.appendChild(title)
# 将DOM对象doc写入文件
f = open('Annotations/'+name+'.xml','w')
f.write(doc.toprettyxml(indent = ''))
f.close()
if __name__ == '__main__':
class_ind=('Pedestrian', 'Car', 'Cyclist')
cur_dir=os.getcwd()
#/home/studieren/PycharmProjects/darknet-master/VOC_KITTI
labels_dir=os.path.join(cur_dir,'labels/data_object_label_2/training/label_2')
#/home/studieren/PycharmProjects/keras-yolo3-master/VOC_KITTI/labels/data_object_label_2/training/label_2
for parent, dirnames, filenames in os.walk(labels_dir): # 分别得到根目录,子目录和根目录下文件
for file_name in filenames:
full_path=os.path.join(parent, file_name) # 获取文件全路径
f=open(full_path)
split_lines = f.readlines()
name= file_name[:-4] # 后四位是扩展名.txt,只取前面的文件名
img_name=name+'.png'
img_path=os.path.join('./JPEGImages/data_object_image_2/training/image_2/',img_name) # 路径需要自行修改 /home/studieren/PycharmProjects/keras-yolo3-master/VOC_KITTI/JPEGImages/data_object_image_2/training/image_2
img_size=cv2.imread(img_path).shape
generate_xml(name,split_lines,img_size,class_ind)
print('all txts has converted into xmls')
# 73 xml保存路径 输出
# 80 .txt标签路径 输入
# 89 图像路径,用于读取图像shape 输入
# 000010.txt_modified
Car 0.80 0 -2.09 1013.39 182.46 1241.00 374.00 1.57 1.65 3.35 4.43 1.65 5.20 -1.42
Car 0.00 0 1.95 354.43 185.52 549.52 294.49 1.43 1.70 3.95 -2.39 1.66 11.80 1.76
Pedestrian 0.00 2 1.41 859.54 159.80 879.68 221.40 1.96 0.72 1.09 8.33 1.55 23.51 1.75
Car 0.00 0 -1.78 819.63 178.12 926.85 251.56 1.51 1.60 3.24 5.85 1.64 16.50 -1.44
Car 0.00 2 -1.69 800.54 178.06 878.75 230.56 1.45 1.74 4.10 6.87 1.62 22.05 -1.39
Car 0.00 0 1.80 558.55 179.04 635.05 230.61 1.54 1.68 3.79 -0.38 1.76 23.64 1.78
Car 0.00 2 1.77 598.30 178.68 652.25 218.17 1.49 1.52 3.35 0.64 1.74 29.07 1.79
Car 0.00 1 -1.67 784.59 178.04 839.98 220.10 1.53 1.65 4.37 7.88 1.75 28.53 -1.40
Car 0.00 1 1.92 663.74 175.36 707.21 204.15 1.64 1.45 3.48 4.50 1.80 42.85 2.02
# 000010.xml
<?xml version="1.0" ?>
<annotation>
<folder>KITTI</folder>
<filename>000010.png</filename>
<source>
<database>The KITTI Database</database>
<annotation>KITTI</annotation>
</source>
<size>
<width>1242</width>
<height>375</height>
<depth>3</depth>
</size>
<object>
<name>Car</name>
<bndbox>
<xmin>1013</xmin>
<ymin>182</ymin>
<xmax>1241</xmax>
<ymax>374</ymax>
</bndbox>
</object>
<object>
<name>Car</name>
<bndbox>
<xmin>354</xmin>
<ymin>185</ymin>
<xmax>549</xmax>
<ymax>294</ymax>
</bndbox>
</object>
<object>
<name>Pedestrian</name>
<bndbox>
<xmin>859</xmin>
<ymin>159</ymin>
<xmax>879</xmax>
<ymax>221</ymax>
</bndbox>
</object>
<object>
<name>Car</name>
<bndbox>
<xmin>819</xmin>
<ymin>178</ymin>
<xmax>926</xmax>
<ymax>251</ymax>
</bndbox>
</object>
<object>
<name>Car</name>
<bndbox>
<xmin>800</xmin>
<ymin>178</ymin>
<xmax>878</xmax>
<ymax>230</ymax>
</bndbox>
</object>
<object>
<name>Car</name>
<bndbox>
<xmin>558</xmin>
<ymin>179</ymin>
<xmax>635</xmax>
<ymax>230</ymax>
</bndbox>
</object>
<object>
<name>Car</name>
<bndbox>
<xmin>598</xmin>
<ymin>178</ymin>
<xmax>652</xmax>
<ymax>218</ymax>
</bndbox>
</object>
<object>
<name>Car</name>
<bndbox>
<xmin>784</xmin>
<ymin>178</ymin>
<xmax>839</xmax>
<ymax>220</ymax>
</bndbox>
</object>
<object>
<name>Car</name>
<bndbox>
<xmin>663</xmin>
<ymin>175</ymin>
<xmax>707</xmax>
<ymax>204</ymax>
</bndbox>
</object>
</annotation>
1.5 voc(.xml) --> darknet .txt xml_to_yolo_txt.py
左上右下坐标转换成xywh,且归一化,label从文字变成 0,1,2,
# xml_to_yolo_txt.py
# 此代码和VOC_KITTI文件夹同目录
import glob
import xml.etree.ElementTree as ET
# 这里的类名为我们xml里面的类名,顺序现在不需要考虑
class_names = ['Car', 'Cyclist', 'Pedestrian']
# xml文件路径
path = '/home/studieren/PycharmProjects/keras-yolo3-master/VOC_KITTI/Annotations/'
# 转换一个xml文件为txt
def single_xml_to_txt(xml_file):
tree = ET.parse(xml_file)
root = tree.getroot()
# 保存的txt文件路径
txt_file = xml_file.split('.')[0]+'.txt'
with open(txt_file, 'w') as txt_file:
for member in root.findall('object'):
#filename = root.find('filename').text
picture_width = int(root.find('size')[0].text)
picture_height = int(root.find('size')[1].text)
class_name = member[0].text
# 类名对应的index
class_num = class_names.index(class_name)
box_x_min = int(member[1][0].text) # 左上角横坐标
box_y_min = int(member[1][1].text) # 左上角纵坐标
box_x_max = int(member[1][2].text) # 右下角横坐标
box_y_max = int(member[1][3].text) # 右下角纵坐标
# 转成相对位置和宽高
x_center = float(box_x_min + box_x_max) / (2 * picture_width)
y_center = float(box_y_min + box_y_max) / (2 * picture_height)
width = float(box_x_max - box_x_min) / picture_width
height = float(box_y_max - box_y_min) / picture_height
#print(class_num, x_center, y_center, width, height)
txt_file.write(str(class_num) + ' ' + str(x_center) + ' ' + str(y_center) + ' ' + str(width) + ' ' + str(height) + '\n')
# 转换文件夹下的所有xml文件为txt
def dir_xml_to_txt(path):
for xml_file in glob.glob(path + '*.xml'):
single_xml_to_txt(xml_file)
dir_xml_to_txt(path)
'''
需要修改的仅是第8行的路径,修改为自己的Annotations
输入Annotations 1.类别信息---> 0,1,2 2.左上右下坐标--> x y w h
输出 .txt (标签 x y w h) 归一化的
'''
000010.xml 如前
000010.txt
0 0.9074074074074074 0.7413333333333333 0.18357487922705315 0.512
0 0.3635265700483092 0.6386666666666667 0.1570048309178744 0.2906666666666667
2 0.6996779388083736 0.5066666666666667 0.01610305958132045 0.16533333333333333
0 0.7024959742351047 0.572 0.0861513687600644 0.19466666666666665
0 0.6755233494363929 0.544 0.06280193236714976 0.13866666666666666
0 0.48027375201288247 0.5453333333333333 0.061996779388083734 0.136
0 0.5032206119162641 0.528 0.043478260869565216 0.10666666666666667
0 0.6533816425120773 0.5306666666666666 0.04428341384863124 0.112
0 0.5515297906602254 0.5053333333333333 0.03542673107890499 0.07733333333333334
1.6 整合制作好的标签和图像在kitti_data路径下
a.将kitti_data文件夹建立在darknet文件目录下
b.在kitti_data 下建立四个文件夹,train_images, train_labels, val_images, val_labels
把./JPEGImages/data_object_image_2/training/image_2/下的images直接放在train_images文件夹下,把Annotations里的文件(只需.txt,但.xml和.txt区分不开,所以都拷贝过来即可)直接放在train_labels文件夹下。val_images和val_lables暂不考虑。
1.7制作train.txt 和 val.txt(因为暂不考虑,所以,只需空文件即可)
kitti_train_val.py
# kitti_train_val.py
# 此代码和kitti_data文件夹同目录
import glob
path = 'kitti_data/'
def generate_train_and_val(image_path, txt_file):
with open(txt_file, 'w') as tf:
for jpg_file in glob.glob(image_path + '*.png'):
tf.write(jpg_file + '\n')
generate_train_and_val(path + 'train_images/', path + 'train.txt') # 生成的train.txt文件所在路径
# generate_train_and_val(path + 'val_images/', path + 'val.txt') # 生成的val.txt文件所在路径
# 生成训练集图像的路径
2.修改配置文件
2.1修改kitti.names
Car
Pedestrian
Cyclist
2.2修改kitti.data 种类3,对应kitti.names ; train 对应train.txt的路径 ;val同理; backup 训练过程以及完成后模型保存的路径。
classes= 3
train = kitti_data/train.txt
valid = kitti_data/val.txt
names = kitti_data/kitti.names
backup = backup/
2.3修改yolo-kitti.cfg 保存在cfg文件夹下。
以yolo.cfg为基础,修改。从文件末尾往上捯,以[yolo]为标志找。
分别修改3处的 classes=80 --> 3 ,filters=255 --> 24 (5+80)*3=255 (5+3)*3=24 xywhscore
3.训练
3.1下载Imagenet预训练模型
存放路径是darkent安装路径
yolov3默认的训练权重为darknet53,我们可以在darknet路径下打开终端,输入命令下载权重:
wget https://pjreddie.com/media/files/darknet53.conv.74
下载起来十分慢,在此提供百度网盘地址。(错误的权重)不知道自己从哪下的,但是这个权重是错的!!!
链接:https://pan.baidu.com/s/1rnOZNMbl_wSahx3IX0eqmg 提取码:zn6j
正确的权重(已测试成功!)
https://pan.baidu.com/s/1t9pypomR6qRptfaOOjDhpg 提取码: nvnh
3.2 输入训练命令
darknet-master文件夹下
./darknet detector train kitti_data/kitti.data cfg/yolov3-kitti.cfg darknet53.conv.74 -gpu 0,1,2,3
我这里只有一个gpu,所以只
./darknet detector train kitti_data/kitti.data cfg/yolov3-kitti.cfg darknet53.conv.74 -gpu 0
3.3 训练过程
我大概是0.2min一轮训练。
(由于加载错误的权重文件,导致全部是nan)
yolov3-kitti
layer filters size input output
0 conv 32 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 32 0.299 BFLOPs
1 conv 64 3 x 3 / 2 416 x 416 x 32 -> 208 x 208 x 64 1.595 BFLOPs
2 conv 32 1 x 1 / 1 208 x 208 x 64 -> 208 x 208 x 32 0.177 BFLOPs
3 conv 64 3 x 3 / 1 208 x 208 x 32 -> 208 x 208 x 64 1.595 BFLOPs
4 res 1 208 x 208 x 64 -> 208 x 208 x 64
5 conv 128 3 x 3 / 2 208 x 208 x 64 -> 104 x 104 x 128 1.595 BFLOPs
6 conv 64 1 x 1 / 1 104 x 104 x 128 -> 104 x 104 x 64 0.177 BFLOPs
7 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BFLOPs
8 res 5 104 x 104 x 128 -> 104 x 104 x 128
9 conv 64 1 x 1 / 1 104 x 104 x 128 -> 104 x 104 x 64 0.177 BFLOPs
10 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BFLOPs
11 res 8 104 x 104 x 128 -> 104 x 104 x 128
12 conv 256 3 x 3 / 2 104 x 104 x 128 -> 52 x 52 x 256 1.595 BFLOPs
13 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BFLOPs
14 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BFLOPs
15 res 12 52 x 52 x 256 -> 52 x 52 x 256
16 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BFLOPs
17 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BFLOPs
18 res 15 52 x 52 x 256 -> 52 x 52 x 256
19 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BFLOPs
20 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BFLOPs
21 res 18 52 x 52 x 256 -> 52 x 52 x 256
22 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BFLOPs
23 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BFLOPs
24 res 21 52 x 52 x 256 -> 52 x 52 x 256
25 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BFLOPs
26 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BFLOPs
27 res 24 52 x 52 x 256 -> 52 x 52 x 256
28 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BFLOPs
29 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BFLOPs
30 res 27 52 x 52 x 256 -> 52 x 52 x 256
31 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BFLOPs
32 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BFLOPs
33 res 30 52 x 52 x 256 -> 52 x 52 x 256
34 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BFLOPs
35 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BFLOPs
36 res 33 52 x 52 x 256 -> 52 x 52 x 256
37 conv 512 3 x 3 / 2 52 x 52 x 256 -> 26 x 26 x 512 1.595 BFLOPs
38 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BFLOPs
39 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BFLOPs
40 res 37 26 x 26 x 512 -> 26 x 26 x 512
41 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BFLOPs
42 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BFLOPs
43 res 40 26 x 26 x 512 -> 26 x 26 x 512
44 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BFLOPs
45 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BFLOPs
46 res 43 26 x 26 x 512 -> 26 x 26 x 512
47 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BFLOPs
48 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BFLOPs
49 res 46 26 x 26 x 512 -> 26 x 26 x 512
50 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BFLOPs
51 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BFLOPs
52 res 49 26 x 26 x 512 -> 26 x 26 x 512
53 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BFLOPs
54 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BFLOPs
55 res 52 26 x 26 x 512 -> 26 x 26 x 512
56 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BFLOPs
57 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BFLOPs
58 res 55 26 x 26 x 512 -> 26 x 26 x 512
59 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BFLOPs
60 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BFLOPs
61 res 58 26 x 26 x 512 -> 26 x 26 x 512
62 conv 1024 3 x 3 / 2 26 x 26 x 512 -> 13 x 13 x1024 1.595 BFLOPs
63 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BFLOPs
64 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs
65 res 62 13 x 13 x1024 -> 13 x 13 x1024
66 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BFLOPs
67 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs
68 res 65 13 x 13 x1024 -> 13 x 13 x1024
69 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BFLOPs
70 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs
71 res 68 13 x 13 x1024 -> 13 x 13 x1024
72 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BFLOPs
73 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs
74 res 71 13 x 13 x1024 -> 13 x 13 x1024
75 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BFLOPs
76 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs
77 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BFLOPs
78 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs
79 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BFLOPs
80 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs
81 conv 24 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 24 0.008 BFLOPs
82 yolo
83 route 79
84 conv 256 1 x 1 / 1 13 x 13 x 512 -> 13 x 13 x 256 0.044 BFLOPs
85 upsample 2x 13 x 13 x 256 -> 26 x 26 x 256
86 route 85 61
87 conv 256 1 x 1 / 1 26 x 26 x 768 -> 26 x 26 x 256 0.266 BFLOPs
88 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BFLOPs
89 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BFLOPs
90 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BFLOPs
91 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BFLOPs
92 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BFLOPs
93 conv 24 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 24 0.017 BFLOPs
94 yolo
95 route 91
96 conv 128 1 x 1 / 1 26 x 26 x 256 -> 26 x 26 x 128 0.044 BFLOPs
97 upsample 2x 26 x 26 x 128 -> 52 x 52 x 128
98 route 97 36
99 conv 128 1 x 1 / 1 52 x 52 x 384 -> 52 x 52 x 128 0.266 BFLOPs
100 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BFLOPs
101 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BFLOPs
102 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BFLOPs
103 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BFLOPs
104 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BFLOPs
105 conv 24 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 24 0.033 BFLOPs
106 yolo
Loading weights from darknet53.conv.74...Done!
Learning Rate: 0.001, Momentum: 0.9, Decay: 0.0005
Resizing
576
Loaded: 0.000027 seconds
Region 82 Avg IOU: 0.000000, Class: 0.000000, Obj: 0.000000, No Obj: 0.000000, .5R: 0.000000, .75R: 0.000000, count: 2
Region 94 Avg IOU: 0.000000, Class: 0.000000, Obj: 0.000000, No Obj: 0.000000, .5R: 0.000000, .75R: 0.000000, count: 1
Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000000, .5R: -nan, .75R: -nan, count: 0
Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000000, .5R: -nan, .75R: -nan, count: 0
Region 94 Avg IOU: 0.000000, Class: 0.000000, Obj: 0.000000, No Obj: 0.000000, .5R: 0.000000, .75R: 0.000000, count: 1
Region 106 Avg IOU: 0.000000, Class: 0.000000, Obj: 0.000000, No Obj: 0.000000, .5R: 0.000000, .75R: 0.000000, count: 1
Region 82 Avg IOU: 0.000000, Class: 0.000000, Obj: 0.000000, No Obj: 0.000000, .5R: 0.000000, .75R: 0.000000, count: 1
Region 94 Avg IOU: 0.000000, Class: 0.000000, Obj: 0.000000, No Obj: 0.000000, .5R: 0.000000, .75R: 0.000000, count: 1
Region 106 Avg IOU: 0.000000, Class: 0.000000, Obj: 0.000000, No Obj: 0.000000, .5R: 0.000000, .75R: 0.000000, count: 6
Region 82 Avg IOU: 0.000000, Class: 0.000000, Obj: 0.000000, No Obj: 0.000000, .5R: 0.000000, .75R: 0.000000, count: 1
Region 94 Avg IOU: 0.000000, Class: 0.000000, Obj: 0.000000, No Obj: 0.000000, .5R: 0.000000, .75R: 0.000000, count: 1
...
加载正确后的正常训练
4.训练结果
保存在backup文件夹下,我电脑不太行,只迭代了200步。设置了每100步,保存一次。
5.测试
要修改cfg.文件 把train的batch 和 sub 注释掉,把test 的 batch 和 sub 解注释掉。
./darknet detector test kitti_data/kitti.data cfg/yolov3-kitti.cfg backup/yolov3-kitti.backup data/000005.png
6.中断训练后继续训练
只不过是 把权重文件变成自己训练多少步以后的权重了,其他没变
./darknet detector train kitti_data/kitti.data cfg/yolov3-kitti.cfg backup/yolov3-kitti.backup -gpus 0,1,2,3
7.遇到的问题
7.1 conda 安装opencv的问题,见博客https://blog.csdn.net/qq_40297851/article/details/104902363
7.2 0 Cuda malloc failed
(pointpillars) studieren@studieren-GS65-Stealth-Thin-8RE:~/PycharmProjects/darknet-master$ ./darknet detector train kitti_data/kitti.data cfg/yolov3-kitti.cfg darknet53.conv.74 -gpu 0
yolov3-kitti
layer filters size input output
0 Cuda malloc failed
: File exists
darknet: ./src/utils.c:256: error: Assertion `0' failed.
已放弃 (核心已转储)
查了好多都没有,通过阅读别人的训练过程,更改了两个地方
a.改小了 yolo-kitti.cfg 中的 width height b.改了 batch ,subdivisions=64.(这个主要!!!)
batch=64 每batch个样本更新一次参数。
subdivisions=16 如果内存不够大,将batch分割为subdivisions个子batch,每个子batch的大小为batch/subdivisions。
训练能力不好,就调小batch,调高subdivisions。但是subdivisions<=batch
8.参考资料
训练过程参考
https://blog.csdn.net/qq583083658/article/details/86321987 主要参考!!!
subdivisions参考
416,416参考
https://blog.csdn.net/kevineeo/article/details/84572589
感谢!!!
9.补充更改模型保存间隔
在detector.c里面,没注意这个模型保存间隔,训练起来以后,还是很麻烦,因为>900 次以后,每10000次才保存一次,真是巨坑。在138行。可以改成500次保存一下。
10.记录一下他的运行时间 titanx 一个gpu
刚开始运行的时候,都是3.多秒,后来咋就变5s/6s+了,那我啥时候能训练完成啊。我的电脑是20多秒。
这会由3.多秒了
查看显存使用率
watch -n 10 nvidia-smi
Fan: 风扇转速(0%–100%),N/A表示没有风扇
Temp: GPU温度(GPU温度过高会导致GPU频率下降)
Perf: 性能状态,从P0(最大性能)到P12(最小性能)
Pwr: GPU功耗
Persistence-M: 持续模式的状态(持续模式耗能大,但在新的GPU应用启动时花费时间更少)
Bus-Id: GPU总线,domain?device.function
Disp.A: Display Active,表示GPU的显示是否初始化
Memory-Usage:显存使用率
Volatile GPU-Util:GPU使用率
ECC: 是否开启错误检查和纠正技术,0/DISABLED, 1/ENABLED
Compute M.: 计算模式,0/DEFAULT,1/EXCLUSIVE_PROCESS,2/PROHIBITED
可以看出利用效率不怎么高,本次是64,16,下次,64,4.试试。