Closed
Description
I have created a PyTorch model checkpoint using torch.save
; however, I'm unable to load this model using torch.load
. I run into the following error:
>>> torch.load('model_best.pth.tar')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ubuntu/anaconda3/envs/pytorch_source/lib/python3.7/site-packages/torch/serialization.py", line 358, in load
return _load(f, map_location, pickle_module)
File "/home/ubuntu/anaconda3/envs/pytorch_source/lib/python3.7/site-packages/torch/serialization.py", line 549, in _load
deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly)
RuntimeError: storage has wrong size: expected -7659745797817883467 got 512
The model was saved using code like this:
def save_checkpoint(epoch, model, best_top5, optimizer, is_best=False, filename='checkpoint.pth.tar'):
state = {
'epoch': epoch+1, 'state_dict': model.state_dict(),
'best_top5': best_top5, 'optimizer' : optimizer.state_dict(),
}
torch.save(state, filename)
if args.local_rank == 0:
if is_best: save_checkpoint(epoch, model, best_top5, optimizer, is_best=True, filename='model_best.pth.tar')
The model was trained across multiple p3.16xlarge
instances.
Metadata
Metadata
Assignees
Labels
No labels