解析train-images-idx3-ubyte与train-labels-idx1-ubyte（mnist数据集） - 代码天地

解析train-images-idx3-ubyte与train-labels-idx1-ubyte（mnist数据集）

其他 2018-12-28 01:05:32 阅读次数: 0

import numpy as np
import struct

def decode_idx3_ubyte(idx3_ubyte_file):
    """
    解析idx3文件的通用函数
    :param idx3_ubyte_file: idx3文件路径
    :return: 数据集
    """
    # 读取二进制数据
    bin_data = open(idx3_ubyte_file, 'rb').read()
 
    # 解析文件头信息，依次为魔数、图片数量、每张图片高、每张图片宽
    offset = 0
    fmt_header = '>iiii'
    magic_number, num_images, num_rows, num_cols = struct.unpack_from(fmt_header, bin_data, offset)
    print ('魔数:%d, 图片数量: %d张, 图片大小: %d*%d' % (magic_number, num_images, num_rows, num_cols))
 
    # 解析数据集
    image_size = num_rows * num_cols
    offset += struct.calcsize(fmt_header)
    fmt_image = '>' + str(image_size) + 'B'
    images = np.empty((num_images, num_rows, num_cols))
    for i in range(num_images):
        if (i + 1) % 10000 == 0:
            print ('已解析 %d' % (i + 1) + '张')
        images[i] = np.array(struct.unpack_from(fmt_image, bin_data, offset)).reshape((num_rows, num_cols))
        offset += struct.calcsize(fmt_image)
    return images
 
 
def decode_idx1_ubyte(idx1_ubyte_file):
    """
    解析idx1文件的通用函数
    :param idx1_ubyte_file: idx1文件路径
    :return: 数据集
    """
    # 读取二进制数据
    bin_data = open(idx1_ubyte_file, 'rb').read()
 
    # 解析文件头信息，依次为魔数和标签数
    offset = 0
    fmt_header = '>ii'
    magic_number, num_images = struct.unpack_from(fmt_header, bin_data, offset)
    print ('魔数:%d, 图片数量: %d张' % (magic_number, num_images))
 
    # 解析数据集
    offset += struct.calcsize(fmt_header)
    fmt_image = '>B'
    labels = np.empty(num_images)
    for i in range(num_images):
        if (i + 1) % 10000 == 0:
            print ('已解析 %d' % (i + 1) + '张')
        labels[i] = struct.unpack_from(fmt_image, bin_data, offset)[0]
        offset += struct.calcsize(fmt_image)
    return labels

猜你喜欢

转载自blog.csdn.net/u010916338/article/details/85159686

解析train-images-idx3-ubyte与train-labels-idx1-ubyte（mnist数据集）

train-labels-idx1-ubyte（mnist数据集）二进制格式

读取mnist数据集方法大全（train-images-idx3-ubyte.gz，train-labels.idx1-ubyte等）（python读取gzip文件）

caffe获取mnist数据集出错：gzip: train-images-idx3-ubyte.gz: not in gzip format

MNIST数据集下载+idx3-ubyte解析【超详细+上手简单】

caffe的idx1-ubyte和idx1-ubyte文件转换成图片文件和文本文件

【代码】详解IDX-Ubyte文件格式及 python读取

batch_idx / len(train_loader)

keras实现手写数字识别第一步：Python3解析MNIST数据集（IDX文件格式）

将MNIST数据集的train图片及lable分割成 number_index.bmp的形式

训练yolov7报错AssertionError: train: No labels in XX\train.cache. Can not train without labels

Insightface制作rec和idx的训练集

train

tensorflow(6) mnist.train.next_batch()函数解析

AssertionError: train: No labels found in ****\train.cache报错

caffe中train过程的train数据集、val数据集、test时候的test数据集区别

【解决方案】AssertionError: train: No labels in.Can not train without labels.

overflow encountered in ubyte_scalars

yolov5数据读取报错：train: No labels found in /root/yolov5-master/VOCData/dataSet_path/train.cache

数据集划分——train set, validate set and test set

YOLOv5数据集划分脚本(train、val、test)

backup-idx

batch_idx作用

label2idx

yolov8 No labels found in /path/train.cache解决

【python报错】UserWarning: train_labels has been renamed targets

【求助帖】如何制作mxnet的.rec和.idx训练数据

predictionio_train解析-执行pio train代码解析

RuntimeWarning: overflow encountered in ubyte_scalars

Python解析MNIST数据集

今日推荐

Linus “吃狗粮”最积极！

开源日报 | Winamp播放器即将开源；生成式AI之战升级第二轮；Linus“吃狗粮”最积极；AI进入泡沫前期；吴泳铭为阿里云带来了什么？

NetBSD 禁止提交由 AI 生成的代码

Apache Doris 2.0.10 版本正式发布！

开源日报 | 大模型开战；大模型独角兽被曝卖身；周鸿祎建议谷歌开源所有产品；最大开源AI社区提供1000万美元共享GPU

开源日报 | Chrome内置Gemini的意义不在于Gemini；中国AI追随之路的五大误区；ECharts创始人“下海”养鱼；谷歌I/O开发者大会什么都有，只是没有惊喜

微软回应中国区AI团队“打包赴美”传闻

周排行

SVN服务端安装在阿里云

实战 | 相机标定

webpack核心概念

note20——》只要肯低头吃苦，人生就会有救

PAT甲级 1062 Talent and Virtue （25 分）排序

NG Toolset开发笔记--5GNR Resource Grid（26）

如何对待上司

oracle命令

第9章 STL迭代器

logstash使用es映射模板

每日归档

更多

2024-05-20(36)

2024-05-19(0)

2024-05-18(4)

2024-05-17(34)

2024-05-16(6)

2024-05-15(24)

2024-05-14(0)

2024-05-13(18)

2024-05-12(0)

2024-05-11(38)