It’s already August and there are only 3 more weeks until I run out of time with this internship! Time certainly flies. As per a recommendation from my mentor who knows way more about HPCC than I, I setup a 3 computer cluster where one computer will be the main node for all except Thor processes and the other 2 computers will handle the computationally expensive tasks. For example, if I have a 16 Thor node system, on these 3 computers, the first computer just needs to have a decent CPU, HDD, and network connection. The other two computers need to have the GPUs, in this case 8 of them each. I will be using AWS’s P2.XX systems which provide the NVIDIA K80. I’m not sure if this was mentioned, but for GPU acceleration to work, you are basically limited to an NVIDIA GPU at the time of writing. The p2.8xlarge has 8 GPUs each, totaling the 16 needed for the 16 Thor nodes. I also worked on improving the efficiency of GNN when its run on many GPUs.
Robert Kennedy
Comments