@@ -111,7 +111,7 @@ The attribute is very similar to **node_cpu_usage_max** but instead of global ma
...
@@ -111,7 +111,7 @@ The attribute is very similar to **node_cpu_usage_max** but instead of global ma
`NORM` - The maximum `cpu_usage` is in the tolerable range
`NORM` - The maximum `cpu_usage` is in the tolerable range
`LOW` - The maximum `cpu_usage` is lower than expected
`LOW` - The maximum `cpu_usage` is lower than tolerable value
**Calculation**:
**Calculation**:
...
@@ -129,7 +129,7 @@ The attribute is very similar to **node_cpu_usage_max** but instead of global ma
...
@@ -129,7 +129,7 @@ The attribute is very similar to **node_cpu_usage_max** but instead of global ma
The attribute for the amount of nodes allocated for the job.
The attribute for the amount of nodes allocated for the job.
It simply divides the job into 2 categories: running on a single node and running on multiple nodes.
It simply divides the jobs into 2 categories: running on a single node and running on multiple nodes.
**Values**:
**Values**:
...
@@ -171,9 +171,7 @@ It is `True` if the node has a high load during the interval `D`.
...
@@ -171,9 +171,7 @@ It is `True` if the node has a high load during the interval `D`.
**Calculation**:
**Calculation**:
If during the interval `D` (300 seconds) consequent measurements of `load1` on the node exceeds the amount of cores the node has, then the value is `True`.
If during the interval `D` (300 seconds) consequent measurements of `load1` on the node exceeds the amount of cores the node has, then the value is `True`, `False` otherwise.
`False` otherwise.
## mem_swap_used
## mem_swap_used
...
@@ -188,3 +186,96 @@ It is `True` if there is a node where swap memory was used.
...
@@ -188,3 +186,96 @@ It is `True` if there is a node where swap memory was used.
**Calculation**:
**Calculation**:
For every node check if `mem_swap_max` value is non zero. If it is non zero for any node, then set the attribute to `True`, otherwise `False`.
For every node check if `mem_swap_max` value is non zero. If it is non zero for any node, then set the attribute to `True`, otherwise `False`.
## gpu_job
It is `True` if the job used at least one GPU
**Values**:
`True` - if GPU was used
`False` - otherwise
**Calculation**:
For every node check if GPU was used. If used then set the attribute to `True`, otherwise `False`.
## gpu_usage_max
This attribute indicates the maximum of GPU usage among all GPUs which were running processes of the job.
**Values**:
`HIGH` - The maximum GPU usage is almost 100%
`NORM` - The maximum GPU usage is in the tolerable range
`LOW` - The maximum GPU usage is lower than tolerable value
`ZERO` - The maximum GPU usage is 0
**Calculation**:
`U` - the GPU usage of particular GPU (max 100%).
`ZERO` = if such GPU exists, that `U <= 0.5`
`LOW` = !`ZERO` & if such GPU exists, that `U < 50` holds
`NORM` = !`ZERO` & !`LOW` & such node exists, that `U < 90` holds
`HIGH` = !`ZERO` & !`LOW` & !`NORM`
## gpu_usage_min
This attribute indicates the minimum of GPU usage among all GPUs which were running processes of the job.
Similar to `gpu_usage_max` but indicates the minimum GPU usage.
**Values**:
`HIGH` - The minimum GPU usage is almost 100%
`NORM` - The minimum GPU usage is in the tolerable range
`LOW` - The minimum GPU usage is lower than tolerable value
`ZERO` - The minimum GPU usage is 0
**Calculation**:
`U` - the GPU usage of particular GPU (max 100%).
`ZERO` = if such GPU exists, that `U <= 0.5`
`LOW` = !`ZERO` & if such GPU exists, that `U < 50` holds
`NORM` = !`ZERO` & !`LOW` & such node exists, that `U < 90` holds
`HIGH` = !`ZERO` & !`LOW` & !`NORM`
## gpus_amount
The attribute for the amount of gpus used in the runtime of the job.
It simply divides the jobs into 2 categories: using a single GPU and using multiple GPUs.
**Values**:
`ONE` - if the number of GPUs equals 1
`MULT` - if the number of GPUs is greater than 1
## gpus_overcrowded_exist
It is `True` if any GPU has multiple processes running on it during the interval `D`.
**Values**:
`True` - if such GPU exists
`False` - otherwise
**Calculation**:
If during the interval `D` (1800 seconds) consequent measurements of number of processes on the GPU exceeds 1, then the value is `True`, `False` otherwise.