This is the first week I was able to get my hands on HPCC Systems 7.10. This is the version of HPCC that will be compatible with Kubernetes, first mentioned in Week 3. There are quite a few benefits of being able to use containerization, but one of them is–potentially–the ability for Thor to shut down and restart after every workunit. HThor has this functionality but Thor keeps the processes alive. Importantly, this means it keeps the Python interpreter alive after a workunit is complete. Unfortunately, TensorFlow assumes the interpreter is shutdown right after the program ends. The result is TF does not have the capacity to free VRAM memory from the GPU… If you are just using Python directly, restarting the interpreter is fairly automatic in a workflow. In Thor, it keeps it alive indefinitely and only closes it if Thor shuts down, which is only when the whole cluster shuts down from either a cluster wide issue, a restart, or a shutdown. Having the ability to shutdown the interpreter at will is very important and should be added as a feature to better accommodate the Python-HPCC user. 7.10 supposedly does this, I will have to wait for the person who knows HPCC the best to return from holiday to find out if this happens or if its even possible.
Robert Kennedy
Comments