This week all the code (major changes at least) is going to be unchanged. Other than cleaning it up for making a repo that others can use. No one wants to (or should have to) read poorly commented code! That being said, some performance benchmarks are needed. I have decided to use 3 different models of varying size and train them on 3 different datasets of varying size. To try and keep things as consistent as possible, the three models’ size are going to be arbitrarily increased, and the 3 datasets will be just sampled with replacement to achieve the 3 sizes. This is to attempt to reduce the number of variables when measuring the performance as the model and dataset size increase when training on different cluster sizes. I will use 1,2,4,8,and16 GPU systems. I will also run some CPU experiments, but these will be severely limited by the remaining time, so only a few might be completed.
Once this is done, I will graph the results and present my results and wrap everything up!
Comments