balance workers across nodes
Currently, the list of worker processes generated by hiopy configure processes
tends to group resource-intensive workers together. When using this list of workers to run them on SLURM as is, this leads to an imbalance in node utilization. A better option might be to cycle through nodes when scheduling workers.
E.g. there could be an option like --cycle n_nodes
in hiopy configure processes
which would order the processes differently. An example would be:
Current order:
w1
w2
w3
w4
w5
w6
with --cycle 3
:
w1
w4
w2
w5
w3
w6