异常错误:
Starting training ...
Traceback (most recent call last):
File "train_retrieval_chembl.py", line 401, in <module>
run_training()
File "train_retrieval_chembl.py", line 388, in run_training
train(
File "train_retrieval_chembl.py", line 326, in train
loss = train_step(
File "train_retrieval_chembl.py", line 291, in train_step
(loss, loss_reduced) = forward_step_func(data_iterator, model)
File "train_retrieval_chembl.py", line 253, in forward_step
batch = get_batch(data_iterator)
File "train_retrieval_chembl.py", line 221, in get_batch
data_b = mpu.broadcast_data(keys, data, datatype)
File "/mnt/d/Pycharm_workspace/DoubleTarget/RetMol/MolBART/megatron_molbart/Megatron-LM-v1.1.5-3D_parallelism/megatron/mpu/data.py", line 88, in broadcast_data
key_size, key_numel, total_numel = _build_key_size_numel_dictionaries(keys,
File "/mnt/d/Pycharm_workspace/DoubleTarget/RetMol/MolBART/megatron_molbart/Megatron-LM-v1.1.5-3D_parallelism/megatron/mpu/data.py", line 42, in _build_key_size_numel_dictionaries
assert data[key].dim() < max_dim, 'you should increase MAX_DATA_DIM'
TypeError: 'DataLoader' object is not subscriptable
主要是说的“TypeError: 'DataLoader' object is not subscriptable” :data是'DataLoader'类型,data的下标对象[key]取不到
我原来用的是命令是“bash train_megatron_retrieval_chembl.sh”
搞了半天原来是我Retmol代码的源数据集弄错了,换成另外一个命令就可以了“bash train_megatron_retrieval.sh”