错误详情
File "~/miniconda3/envs/dx/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "~/miniconda3/envs/dx/lib/python3.8/site-packages/torch/utils/data/dataset.py", line 261, in __getitem__
return tuple(tensor[index] for tensor in self.tensors)
File "~/miniconda3/envs/dx/lib/python3.8/site-packages/torch/utils/data/dataset.py", line 261, in <genexpr>
return tuple(tensor[index] for tensor in self.tensors)
RuntimeError: CUDA error: initialization error
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
错误原因
数据使用TensorDataset
封装,再传入DataLoader
中。
DataLoader
数据已经在GPU
中,导致报错。
...
TRAINING_PROP = 0.8
num_data_sample = data_x.shape[0]
train_x = FloatTensor(data_x[:int(num_data_sample*TRAINING_PROP)]).to(DEVICE)
train_y = FloatTensor(data_y[:int(num_data_sample*TRAINING_PROP)]).to(DEVICE)
...
train_dataloder = DataLoader(dataset=TensorDataset(train_x, train_y),
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=8)
...
for step, (batch_x, batch_y) in enumerate(train_dataloder):
...
pred_y = model(batch_x)
...
错误解决
DataLoader
的数据须放在CPU
中,在训练的时候再传到GPU
。
...
num_data_sample = data_x.shape[0]
# 不传到GPU上
train_x = FloatTensor(data_x[:int(num_data_sample*TRAINING_PROP)])
train_y = FloatTensor(data_y[:int(num_data_sample*TRAINING_PROP)])
...
train_dataloder = DataLoader(dataset=TensorDataset(train_x, train_y),
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=8)
...
for step, (batch_x, batch_y) in enumerate(train_dataloder):
# 传到GPU上
batch_x = batch_x.to(DEVICE)
batch_y = batch_y.to(DEVICE)
...
pred_y = model(batch_x)
...