-
Notifications
You must be signed in to change notification settings - Fork 340
Open
Description
It seems that ibverbs transport doesn't support transferring large volumes of data.
The max_msg_sz of my device is 1GB (0x40000000) and when I transfer data that larger than 1GB (like 1GB + 4bytes), an error occured as below.
$ ibv_devinfo -v | grep max_msg_sz
max_msg_sz: 0x40000000
max_msg_sz: 0x40000000
max_msg_sz: 0x40000000
max_msg_sz: 0x40000000
max_msg_sz: 0x40000000
max_msg_sz: 0x40000000
max_msg_sz: 0x40000000
max_msg_sz: 0x40000000
command
./gloo/benchmark/benchmark -s 2 -r 0 -h 110.110.8.158 -p 6379 -x 123 -t ibverbs --no-verify broadcast --messages 1 --elements 268435457
./gloo/benchmark/benchmark -s 2 -r 1 -h 110.110.8.158 -p 6379 -x 123 -t ibverbs --no-verify broadcast --messages 1 --elements 268435457
error
[/nvme1/chenxin/ws/test/gloo/gloo/transport/ibverbs/pair.cc:587] ERROR LID: 0 QPN: 5728 PSN: 3360883->LID: 0 QPN: 5727 PSN: 3360883: Exception in handleCompletion: [enforce fail at /nvme1/chenxin/ws/test/gloo/gloo/transport/ibverbs/pair.cc:681] wc->status == IBV_WC_SUCCESS. 1 vs 0. Memory region recv for slot 0: local length error
[/nvme1/chenxin/ws/test/gloo/gloo/transport/ibverbs/device.cc:230] ERROR Exception while handling completion event: [enforce fail at /nvme1/chenxin/ws/test/gloo/gloo/transport/ibverbs/pair.cc:681] wc->status == IBV_WC_SUCCESS. 1 vs 0. Memory region recv for slot 0: local length error
terminate called after throwing an instance of 'gloo::EnforceNotMet'
what(): [enforce fail at /nvme1/chenxin/ws/test/gloo/gloo/transport/ibverbs/pair.cc:681] wc->status == IBV_WC_SUCCESS. 1 vs 0. Memory region recv for slot 0: local length error
Aborted (core dumped)
Metadata
Metadata
Assignees
Labels
No labels