Kaggle比赛Flower Classification with TPUs中配置TPU以及加载公开数据集的问题

配置TPU

关于如何配置TPU,官方文档(Tensor Processing Units (TPUs))里面写的明明白白:
Once you have flipped the “Accelerator” switch in your notebook to “TPU v3-8”, this is how to enable TPU training in Tensorflow Keras:

# detect and init the TPU
tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
tf.config.experimental_connect_to_cluster(tpu)
tf.tpu.experimental.initialize_tpu_system(tpu)

# instantiate a distribution strategy
tpu_strategy = tf.distribute.experimental.TPUStrategy(tpu)

# instantiating the model in the strategy scope creates the model on the TPU
with tpu_strategy.scope():
    model = tf.keras.Sequential( … ) # define your model normally
    model.compile( … )
    
# train model normally
model.fit(training_dataset, epochs=EPOCHS, steps_per_epoch=…)

TPUs are network-connected accelerators and you must first locate them on the network. This is what TPUClusterResolver() does.

Two additional lines of boilerplate and you can define a TPUStrategy.This object contains the necessary distributed training code that will work on TPUs with their 8 compute cores (see hardware section below).

Finally, you use the TPUStrategy by instantiating your model in the scope of the strategy. This creates the model on the TPU. Model size is constrained by the TPU RAM only, not by the amount of memory available on the VM running your Python code. Model creation and model training use the usual Keras APIs.

To go fast on a TPU, increase the batch size. The rule of thumb is to use batches of 128 elements per core (ex: batch size of 128*8=1024 for a TPU with 8 cores). At this size, the 128x128 hardware matrix multipliers of the TPU (see hardware section below) are most likely to be kept busy. You start seeing interesting speedups from a batch size of 8 per core though. In the sample above, the batch size is scaled with the core count through this line of code:

BATCH_SIZE = 16 * tpu_strategy.num_replicas_in_sync

加载Kaggle公开数据集

如何加载数据集官方文档写的就不是那么友好了,尤其针对我们这些不想翻墙的用户,首先是官方说法:

Because TPUs are very fast, many models ported to TPU end up with a data bottleneck. The TPU is sitting idle, waiting for data for the most part of each training epoch. TPUs read training data exclusively from GCS (Google Cloud Storage). And GCS can sustain a pretty large throughput if it is continuously streaming from multiple files in parallel. Following a couple of best practices will optimize the throughput:

For TPU training, organize your data in GCS in a reasonable number (10s to 100s) of reasonably large files (10s to 100s of MB).

With too few files, GCS will not have enough streams to get max throughput. With too many files, time will be wasted accessing each individual file.

Data for TPU training typically comes sharded across the appropriate number of larger files. The usual container format is TFRecords. You can load a dataset from TFRecords files by writing:

# On Kaggle you can also use KaggleDatasets().get_gcs_path() to obtain the GCS path of a Kaggle dataset
filenames = tf.io.gfile.glob("gs://flowers-public/tfrecords-jpeg-512x512/*.tfrec") # list files on GCS
dataset = tf.data.TFRecordDataset(filenames)
dataset = dataset.map(...) # TFRecord decoding here...

什么意思呢,就是说,TPU计算速度很快,如果从本地读取训练数据,那TPU大部分时间都是空闲的,模型也会出现 data bottleneck,为了避免这种情况,Google要求所有在Kaggle上使用TPU进行模型训练的用户,将数据放在GCS(Google Cloud Storage)上,如歌还是向以前用GPU训练一样读取本地文件,那么就会收到一堆莫名其妙的报错。。。。。。

好在Kaggle的公开数据集在GCS上都有备份,我们可以直接读取,直接贴代码吧

from kaggle_datasets import KaggleDatasets
gcs_path = KaggleDatasets().get_gcs_path(‘flower classification with tpus’) 
gcs_path#如果报错,就在名字中建加上下划线
#gcs_path是GCS上数据集根目录的链接
我们可以用gsutil来查询返回的gcs_path下有哪些目录,然后找到训练数据地址
gsutil ls gs://kds-b2e6cdbc4af76dcf0363776c09c12fe46872cab211d1de9f60ec7aec
dataset_filenames = tf.io.gfile.glob(r'gs://kds-b2e6cdbc4af76dcf0363776c09c12fe46872cab211d1de9f60ec7aec/tfrecords-jpeg-512x512/*a*/*.tfrec')
dataset = tf.data.TFRecordDataset(dataset_filenames)
dataset = dataset.with_options(ignore_order)
#tfrec格式文件的读取方式,比赛说明里告诉了文件有什么内容,但是事实上
#文件中包含id,class,image
#官方声明为id,label,img
feature_description = {
    'class':tf.io.FixedLenFeature([],tf.int64),
    'image':tf.io.FixedLenFeature([],tf.string)
}
def dataset_decode(data):
    decode_data = tf.io.parse_single_example(data,feature_description)
    label = decode_data['class']
    image = tf.image.decode_jpeg(decode_data['image'],channels=3)
    image = tf.reshape(image,[512,512,3])
    image = tf.cast(image,tf.float32)
    image = (image - 127.5) / 127.5
    return image,label
dataset = dataset.map(dataset_decode)
dataset = dataset.shuffle(DATASET_SIZE).repeat().batch(BATCH_SIZE).prefetch(AUTO)
print(dataset)

到此为止,数据就已经准备好了,其余和用GPU训练一样,batch_size开大一点没关系,放心

发布了1 篇原创文章 · 获赞 0 · 访问量 9

猜你喜欢

转载自blog.csdn.net/dmcgow/article/details/104361655