86
« on: October 30, 2017, 10:04 »
1) There is a very distinct different between your *physical* computing cores, and the *MPI* processes you launch: When running an ATK job, you choose both the number of cores to use (physical cores) and the number of MPI processes to launch on those cores. It is in general highly recommended that the two numbers be identical! That is, if you launch more MPI processes than the cores you have allocated, some cores will run more than one MPI process. So yes, MPI does not stop you from overloading the physical cores with several MPI processes, which may lead to an unstable job because several MPI processes must then share the RAM available to only one physical core. It is your own responsibility to avoid this by carefully making sure the number of physical cores match the number of MPI processes. There may of course be situations where it is advantageous to run fewer MPI processes than allocated cores (e.g. to get more memory per MPI process), but this is rare.
2) Not knowing the details of your cluster setup, it's hard to know exactly how you should run your ATK jobs to get 100% performance. You mention that your cluster is composed of 2 different sets of cores (A and B). In such a situation, I would usually recommend to run an ATK job on one of those sub-systems only, i.e. run job #1 on A and job #2 on B. But again, this depends on the cluster setup. I believe we already have a support ticket on support@quantumwise.com about this, that's a better channel for discussing details of cluster setups, so I will continue on that one from here.