Skip to content
GitLab
Projects
Groups
Snippets
/
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
ProfiT-HPC
aggregator
Commits
1953b9c2
Commit
1953b9c2
authored
Aug 05, 2020
by
Azat Khuziyakhmetov
Browse files
added the DB spec
parent
cc52d827
Changes
1
Hide whitespace changes
Inline
Side-by-side
docs/InfluxDBspec.md
0 → 100644
View file @
1953b9c2
# InfluxDB specification
This file contains specifications for values stored in InfluxDB.
Grafana, ASCII and PDF columns denote if the metric is used for it.
In these columns
`MIN`
is a minimal set of measurements and
`EXT`
is extended.
## Batch system Data
Collected by the post execution script
`exportjobinfo.py`
.
Collected as soon as a job finishes.
Measurement:
`pfit-jobinfo`
.
| Key | Value | Info | Grafana | ASCII | PDF |
| -------------- | -------------- | --------------------------------------------------------------------------- | ------- | ----- | --- |
| jobid | string
\*
| Job ID in the batch system | MIN | MIN | MIN |
| uname | string
\*
| user name in HPC | MIN | | |
| user_name | string | user name in HPC | MIN | MIN | MIN |
| used_queue | string | used queue in the batch system | MIN | MIN | MIN |
| num_nodes | integer | number of nodes the job runs on | MIN | MIN | MIN |
| requested_cu | integer | requested compute units (slots/cores/...) | MIN | MIN | MIN |
| requested_time | seconds | requested time | MIN | MIN | MIN |
| submit_time | UNIX timestamp | submitted time | MIN | MIN | MIN |
| start_time | UNIX timestamp | start time | MIN | MIN | MIN |
| end_time | UNIX timestamp | end time | MIN | MIN | MIN |
| run_time | seconds | duration of the job | MIN | | |
| alloc_cu | string | allocation of the job on nodes. Format:
`CU1*NODE1:CU2*NODE2:...:CU#*NODE#`
| | MIN | MIN |
Measurement:
`pfit-jobinfo-alloc`
For every allocated node in the job there should be 1 entry in this measurement.
| Key | Value | Info | Grafana | ASCII | PDF |
| --------- | -------- | -------------------------- | ------- | ----- | --- |
| host | string
\*
| Hostname of the node | MIN | MIN | MIN |
| jobid | string
\*
| Job ID in the batch system | MIN | MIN | MIN |
| alloc_cu | integer | allocated amount of CUs | MIN | MIN | MIN |
| alloc_mem | integer | allocated amount of memory | MIN | MIN | MIN |
## Node Data
Collected by the Telegram plugin
`pfit-nodeinfo`
.
Collected with some feasible interval (minute/hour/day).
Measurement:
`pfit-nodeinfo`
.
| Key | Value | Info | Grafana | ASCII | PDF |
| ---------------- | -------- | -------------------- | ------- | ----- | --- |
| host | string
\*
| Hostname of the node | | MIN | MIN |
| cores_per_socket | integer | cores per socket | | MIN | MIN |
| cpu_model | string | CPU model | | MIN | MIN |
| main_mem | bytes | total amount of RAM | | MIN | MIN |
| sockets | integer | number of sockets | | MIN | MIN |
| threads_per_core | integer | threads per core | | MIN | MIN |
## Process data
Collected by the Telegram plugin
`pfit-uprocstat`
.
Collected with higher frequency (by default: 10 second interval) for every user process in the system (not threads).
Measurement:
`pfit-uprocstat`
.
| Key | Value | Info | Grafana | ASCII | PDF |
| ---------------------------- | -------- | ---------------------------- | ------- | ----- | --- |
| host | string
\*
| Hostname of the node | MIN | MIN | MIN |
| jobid1 | string
\*
| Job ID | MIN | MIN | MIN |
| jobid2 | string
\*
| Job ID | | | |
| pid | string
\*
| PID (Process ID) | MIN | MIN | MIN |
| process_name | string
\*
| process name | MIN | | |
| uid | string
\*
| UID of process owner | | | |
| user | string
\*
| Username of process owner | MIN | MIN | MIN |
| cpu_time_idle | float | CPU idle time | | | |
| cpu_time_iowait | float | CPU iowait time | | | |
| cpu_time_system | float | CPU system time | MIN | MIN | MIN |
| cpu_time_user | float | CPU user time | MIN | MIN | MIN |
| cpu_time_guest | float | CPU guest time | | | |
| cpu_time_guest_nice | float | CPU guest nice time | | | |
| cpu_time_irq | float | CPU irq time | | | |
| cpu_time_nice | float | CPU nice time | | | |
| cpu_time_soft_irq | float | CPU soft irq time | | | |
| cpu_time_steal | float | CPU steal time | | | |
| cpu_time_stolen | float | CPU stolen time | | | |
| cpu_usage | float | CPU usage | MIN | MIN | MIN |
| involuntary_context_switches | integer | involuntary context switches | | | MIN |
| voluntary_context_switches | integer | voluntary context switches | | | MIN |
| memory_rss | bytes | memory RSS | MIN | MIN | MIN |
| memory_swap | bytes | memory SWAP | MIN | MIN | MIN |
| memory_vms | bytes | VMS | MIN | MIN | MIN |
| num_fds | integer | number of file descriptors | | | MIN |
| num_threads | integer | number of threads | | | |
| read_bytes | bytes | bytes read | MIN | MIN | MIN |
| read_count | integer | read count | MIN | MIN | MIN |
| write_bytes | bytes | bytes written | MIN | MIN | MIN |
| write_count | integer | write count | MIN | MIN | MIN |
## Swap data
Collected by the Telegram plugin
`swap`
.
Collected with higher frequency (by default: 10 second interval) for every node.
Measurement:
`swap`
.
| Key | Value | Info | Grafana | ASCII | PDF |
| ------------ | -------- | ------------------------------------------------------------ | ------- | ----- | --- |
| host | string
\*
| Hostname of the node | | | |
| free | bytes | free swap memory | | | MIN |
| in | bytes | data swapped in since last boot calculated from page number | | | |
| out | bytes | data swapped out since last boot calculated from page number | | | |
| total | bytes | total swap memory | | | |
| used | bytes | used swap memory | | | MIN |
| used_percent | float | percentage of swap memory used | | | |
## CPU data
Collected by the Telegram plugin
`cpu`
.
Collected with higher frequency (by default: 10 second interval) for every CPU of the node and total values.
Measurement:
`cpu`
.
| Key | Value | Info | Grafana | ASCII | PDF |
| ---------------- | -------- | --------------------- | ------- | ----- | --- |
| host | string
\*
| Hostname of the node | MIN | | MIN |
| cpu | string
\*
|
<cpuN>
or
<cpu-total>
| MIN | | |
| usage_guest | float | CPU usage guest | MIN | | |
| usage_guest_nice | float | CPU usage guest nice | | | |
| usage_idle | float | CPU usage idle | MIN | | MIN |
| usage_iowait | float | CPU usage iowait | MIN | | MIN |
| usage_irq | float | CPU usage irq | MIN | | |
| usage_nice | float | CPU usage nice | MIN | | |
| usage_softirq | float | CPU usage softirq | | | |
| usage_steal | float | CPU usage steal | | | |
| usage_system | float | CPU usage system | MIN | | MIN |
| usage_user | float | CPU usage user | MIN | | MIN |
## Memory data
Collected by the Telegram plugin
`mem`
.
Collected with higher frequency (by default: 10 second interval) for every node.
Measurement:
`mem`
.
| Key | Value | Info | Grafana | ASCII | PDF |
| ----------------- | -------- | --------------------------- | ------- | ----- | --- |
| host | string
\*
| Hostname of the node | MIN | | MIN |
| active | bytes | active memory | MIN | | |
| available | bytes | available memory | MIN | | MIN |
| available_percent | float | available percent of memory | MIN | | MIN |
| buffered | bytes | buffered memory | | | |
| cached | bytes | cached memory | | | |
| free | bytes | free memory | MIN | | |
| inactive | bytes | inactive memory | | | |
| total | bytes | total memory | MIN | | MIN |
| used | bytes | used memory | MIN | | MIN |
| used_percent | float | used percent of memory | MIN | | MIN |
## System data
Collected by the Telegram plugin
`system`
.
Collected with higher frequency (by default: 10 second interval) for every node.
Measurement:
`system`
.
| Key | Value | Info | Grafana | ASCII | PDF |
| ------------- | -------- | --------------------------- | ------- | ----- | --- |
| host | string
\*
| Hostname of the node | MIN | | MIN |
| load1 | float | load1 | MIN | | MIN |
| load5 | float | load5 | | | |
| load15 | float | load15 | | | |
| n_cpus | integer | Number of CPUs | MIN | | |
| n_users | integer | Number of users on the host | | | |
| uptime | seconds | Uptime | | | |
| uptime_format | string | Formatted uptime | | | |
## Beegfs data
Collected by the Telegraf plugin script
`beegfs_clients.sh`
Measurement:
`beegfs_clients`
| Key | Value | Info | Grafana | ASCII | PDF |
| ------ | -------- | ------------------------ | ------- | ----- | --- |
| host | string
\*
| Hostname of the node | EXT | EXT | EXT |
| MiB-rd | float | MiBs read by the host | EXT | EXT | EXT |
| MiB-wr | float | MiBs written by the host | EXT | EXT | EXT |
## Infiniband data
Collected by the Telegraf plugin script
`infiniband.sh`
Measurement:
`infiniband`
| Key | Value | Info | Grafana | ASCII | PDF |
| ------------ | -------- | ---------------------------- | ------- | ----- | --- |
| host | string
\*
| Hostname of the node | EXT | EXT | EXT |
| PortXmitData | float | Data transmitted by the host | EXT | EXT | EXT |
| PortRcvData | float | Data Received by the host | EXT | EXT | EXT |
## Nvidia GPU data
Collected by the Telegraf plugin script
`nvidiatotal.sh`
Measurement:
`nvidia_gpu`
| Key | Value | Info | Grafana | ASCII | PDF |
| --------------- | -------- | --------------------------------- | ------- | ----- | --- |
| host | string
\*
| Hostname of the node | EXT | EXT | EXT |
| bus | string
\*
| The Bus ID of the GPU on the node | EXT | EXT | EXT |
| gpu_name | string
\*
| GPU type | EXT | EXT | EXT |
| power.limit | float | The power limit of the GPU | EXT | | |
| memory.total | float | Total memory of GPU | EXT | | EXT |
| temperature.gpu | float | The temperature of GPU | EXT | | EXT |
| power.draw | float | Power draw of GPU | EXT | | EXT |
| memory.used | float | Used memory of GPU | EXT | EXT | EXT |
| utilization.gpu | float | GPU utilization | EXT | EXT | EXT |
## Nvidia GPU process data
Collected by the Telegraf plugin script
`nvidiatotal.sh`
Measurement:
`nvidia_proc`
| Key | Value | Info | Grafana | ASCII | PDF |
| -------- | -------- | ----------------------------------- | ------- | ----- | --- |
| host | string
\*
| Hostname of the node | EXT | EXT | EXT |
| bus | string
\*
| The Bus ID of the GPU on the node | EXT | EXT | EXT |
| gpu_name | string
\*
| GPU type | EXT | EXT | EXT |
| JOBID | string
\*
| JOBID the process belongs to | EXT | EXT | EXT |
| username | string
\*
| Username the process belongs to | EXT | EXT | EXT |
| cpu_pcpu | float | CPU usage percentage of the process | EXT | EXT | EXT |
| cpu_rss | float | Memory used by the process | EXT | EXT | EXT |
| pid | float | The process PID | EXT | EXT | EXT |
\*
Tag keys
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment