This week I was able to get GNN to work on a multi thor and mutli GPU system without errors on a bare metal system on AWS! There have seemingly been an extra amount of roadblocks this year from one type of bug or another. It is designed so that one Thor gets one GPU, and if there are more Thors than GPUs, these Thors will put the GNN onto the CPU. This of course negates any benefits of using a GPU, since we still have to wait for the CPU, but it avoids crashing and having to use one GPU for 2 separate processes synchronously. A note to the user that their setup needs a 1:1 Thor:GPU ratio is needed. Or at the very least, limit which Thor they can use for GNN, if they want GPU acceleration.
Robert Kennedy
Comments