Commit 85312f96 authored by Igor Merkulow's avatar Igor Merkulow
Browse files

more modification for automatic parsing

parent 8aa13833
......@@ -9,14 +9,15 @@ Additional values, that are not covered by this specification, are allowed, but
## Data subsets
In the combined interface for all the reports, we have to deal with multiple kinds of data, each used in the different context. At the moment we can identify 5 of them:
|JSON name|Explanation|
| --- | --- |
|general|global job data - is valid for the job and doesn't change during the entire job runtime (e.g. job-ID, user name etc.).|
|static|static node data - node-related data that doesn't change for the duration of the job (e.g. node name, CPU model, RAM amount etc.).|
|dynamic|dynamic node data - data samples (e.g disk reads, memory usage etc., but converted to our format specification).|
|aggregates|aggregates per node - data, aggregated over the job runtime for every node separately (e.g. maximum CPU load, total number of packets sent over network etc.)|
|totals|aggregates per job - data, aggregated over all nodes (e.g. if swap was used)|
- global job data - is valid for the job and doesn't change during the entire job runtime (e.g. job-ID, user name etc.).
- static node data - node-related data that doesn't change for the duration of the job (e.g. node name, CPU model, RAM amount etc.).
- dynamic node data - data samples (e.g disk reads, memory usage etc., but converted to our format specification).
- aggregates per node - data, aggregated over the job runtime for every node separately (e.g. maximum CPU load, total number of packets sent over network etc.)
- aggregates per job - data, aggregated over all nodes (e.g. if swap was used)
Global job-related and static node-related data is displayed in the reports more or less as is, either as a part of the header or as a baseline for calculations. Dynamic data samples are used for the time series plots in PDF report, for aggregates, and may be used for calculating recommendations, but most probably are not going to be displayed in the raw form (due to the data amount). Aggregated values are used to estimate job efficiency and can also be shown in the reports.
Global job-related and static node-related data is displayed in the reports more or less as is, either as a part of the header or as a baseline for calculations. Dynamic data samples are used for the time series plots in PDF report, for aggregates, and may be used for calculating recommendations, but most probably are not going to be displayed in the raw form (due to the data amount). Aggregated and total values are used to estimate job efficiency and can also be shown in the reports.
TODO: network data is missing yet. We need to define the supported network types, what kind of data is needed, how it should be aggregated, and what kind of information can be derived from it.
......@@ -26,31 +27,32 @@ TODO: how to deal with partial data? E.g. if not all file systems used are suppo
TODO: do we already have metrics, supported by all tools/plugins, but not included in the reports?
## Value Constraints
- STR: string is limited to upper- and lower-case ASCII characters, numbers, and underscores. It can be between 3 and 45 characters long.
- STR-EXT: string can also contain whitespace and other characters (e.g. punctuation, '@' or parentheses). Length still between 3 and 45 characters.
- INT-POS: value is strictly greater than zero.
- INT-POS0: value is zero or greater.
- INT-TS: value is an UNIX timestamp, representing the number of seconds since 01.01.1970, minimum value is 1,000,000,000 (was on 09.09.2001, so we should get larger values). If this constraint is not met, there are probably other issues on that machine.
- INT-01: value is either 0 or 1
- FLOAT-POS0: floating point value, greater then or equal 0.0
- FLOAT-PERCENT: floating point value, between 0 and 1 (both incl.)
- BYTE: value is in bytes
- SEC: value is in seconds
- MEM-PR: Plausibility check for memory amounts - default is [10MB, 10TB]
- SAMPL: sampling interval regex "[0-9]+ [h\|m\|s]"
## Value Constraints and Units
Table of constraints:
|Abbreviation|Constraint|Explanation|
| --- | --- | --- |
|STR|"[a-zA-Z0-9_]{3, 45}"|string is limited to upper- and lower-case ASCII characters, numbers, and underscores. It can be between 3 and 45 characters long.|
|STR-EXT|"[a-zA-Z0-9_ \(\)@\.,\-]{3, 45}"|string can also contain whitespace and other characters (e.g. punctuation, '@' or parentheses). Length still between 3 and 45 characters.|
|INT-POS|">0"|value is strictly greater than zero.|
|INT-POS0|">=0"|value is zero or greater.|
|INT-TS|">=1000000000"|value is an UNIX timestamp, representing the number of seconds since 01.01.1970, minimum value is 1,000,000,000 (was on 09.09.2001, so we should get larger values). If this constraint is not met, there are probably other issues on that machine.|
|INT-01|"[01]"|value is either 0 or 1|
|FLOAT-POS0|">=0.0"|floating point value, greater then or equal 0.0|
|FLOAT-PERCENT|">=0.0","<=1.0"|floating point value, between 0 and 1 (both incl.)|
|MEM-PR|">=10485760", "<=10485760*1048576"|Plausibility check for memory amounts - default is [10MB, 10TB]|
|SAMPL|"[0-9]{1,5}[hms]"|sampling interval in the form "1s" or "24h" etc.|
Used unit abbreviations:
|Abbreviation|Unit|Explanation|
| --- | --- | --- |
|BYTE|byte(s)|value is in bytes, thus only positive integers or zero is allowed|
|SEC|s|value is in seconds, thus only positive integers or zero is allowed|
|NSEC|ns|value is in nanoseconds|
|HZ|Hz|value is in Hertz|
|MBS|MB/s|Megabytes per second|
Question mark means that the data is not yet there, minus sign means "not applicable".
......@@ -78,7 +80,7 @@ Additional explanations:
- `pfit_requested_time` should roughly be equal to (start_time - end_time). Upper limit is one year.
- `pfit_requested_cores` is aggregated over all nodes (total sum).
- `pfit_num_used_nodes` - it has to be equal to the number of node-related data blocks in the set.
- `pfit_sampling_interval` is a value that we set in the configuration, so it should be identical for all nodes, but it can also be aggregated if necessary (e.g. the shortest should be stated here.). TODO: define how exactly the interval is specified (e.g. if "1m30s" should be allowed or only "90s") and adapt the RegEx. Maybe integer value in seconds would be better. Currently, only lengths between 2 and 10 are allowed by the validator.
- `pfit_sampling_interval` is a value that we set in the configuration, so it should be identical for all nodes, but it can also be aggregated if necessary (e.g. the shortest should be stated here.). TODO: define how exactly the interval is specified (e.g. if "1m30s" should be allowed or only "90s") and adapt the RegEx. Maybe integer value in seconds would be better. Currently, only lengths between 2 and 6 are allowed by the validator.
- `pfit_return_value` should indicate if the job has finished correctly. Is not necessarily the result delivered by the job management system, since programs can have also negative exit codes.
## Metrics per node
......@@ -90,8 +92,8 @@ Additional explanations:
|`pfit_node_name`|string|STR|-|y|Identifier of a node, has to be unique for the job|
|`pfit_cpu_model`|string|STR-EXT|-|y|CPU vendor and model name|
|`pfit_available_main_mem`|integer|MEM-PR|BYTE|y|Node's total RAM amount|
|`pfit_mem_latency`|float|FLOAT-POS0|Nanoseconds|n|RAM latency, default value|
|`pfit_mem_bw`|float|FLOAT-POS0|MB per second|n|RAM bandwidth, default value|
|`pfit_mem_latency`|float|FLOAT-POS0|NSEC|n|RAM latency, default value|
|`pfit_mem_bw`|float|FLOAT-POS0|MBS|n|RAM bandwidth, default value|
|`pfit_sockets_per_node`|integer|Range [1, 16]|-|y|Number of CPU sockets for this node|
|`pfit_cores_per_socket`|integer|Range [1, 1024]|-|y|Number of actual CPU cores for every socket|
|`pfit_phys_threads_per_core`|integer|Range [1, 1024]|-|y|How many HW threads can be executed on a CPU core|
......@@ -120,7 +122,7 @@ Additional explanations:
|Report metric|Data type|Constraint|Unit|Required|Label / Caption|
| --- | --- | --- | --- | --- | --- |
|`pfit_timestamp`|integer|INT-POS0|Nanoseconds|y|Time stamp identifying the metrics|
|`pfit_timestamp`|integer|INT-POS0|NSEC|y|Time stamp identifying the metrics|
|`pfit_cpu_time_user`|integer|INT-POS0|SEC|y|Part of the total walltime spent in the user code|
|`pfit_cpu_time_system`|integer|INT-POS0|SEC|y|Part of the total walltime in system calls or kernel processes|
|`pfit_cpu_time_idle`|integer|INT-POS0|SEC|y|Part of the total walltime in the idle task|
......@@ -135,7 +137,7 @@ Additional explanations:
|`pfit_num_processes`|integer|INT-POS0|-|y|Number of processes on the node|
|`pfit_load1`|float|FLOAT-POS0|-|n|Weighted average number of processes waiting|
|`pfit_total_context_switches`|integer|INT-POS0|-|n|Total amount of context switches|
|`pfit_frequency_per_cpuX`|integer|INT-POS0|Hertz|n|Current CPU frequency for CPU X|
|`pfit_frequency_per_cpuX`|integer|INT-POS0|HZ|n|Current CPU frequency for CPU X|
Additional explanations:
......@@ -166,10 +168,10 @@ IMPORTANT: Average values are often floats. Since the floating point value of e.
|`pfit_mem_rss_node_max`|integer|MEM-PR|BYTE|y|Max RSS memory statistics for this node|
|`pfit_mem_rss_node_avg`|integer|MEM-PR|BYTE|y|Average RSS memory statistics for this node|
|`pfit_used_swap_node_max`|integer|INT-POS0|BYTE|y|Max used swap on this node|
|`pfit_frequency_per_cpuX_node_min`|integer|INT-POS0|Hertz|n|Min frequency aggregated per CPU|
|`pfit_frequency_per_cpuX_node_max`|integer|INT-POS0|Hertz|n|Max frequency aggregated per CPU|
|`pfit_frequency_per_cpuX_node_avg`|integer|INT-POS0|Hertz|n|Average frequency aggregated per CPU|
|`pfit_frequency_per_cpuX_node_median`|integer|INT-POS0|Hertz|n|Median frequency aggregated per CPU|
|`pfit_frequency_per_cpuX_node_min`|integer|INT-POS0|HZ|n|Min frequency aggregated per CPU|
|`pfit_frequency_per_cpuX_node_max`|integer|INT-POS0|HZ|n|Max frequency aggregated per CPU|
|`pfit_frequency_per_cpuX_node_avg`|integer|INT-POS0|HZ|n|Average frequency aggregated per CPU|
|`pfit_frequency_per_cpuX_node_median`|integer|INT-POS0|HZ|n|Median frequency aggregated per CPU|
Additional explanations:
......
......@@ -88,9 +88,32 @@ def _convert(name, datatype, constraint, unit, required, label):
return res
def parse(fname):
with open(fname, 'r') as f:
data = f.readlines()
def datatable2list(table, tname=None):
res = []
for j in table:
name = j[0][1:-1]
if not name.startswith('pfit'):
continue
res.append(_convert(name, j[1], j[2], j[3], j[4], j[5]))
if tname is None:
return {'table': res}
else:
return {tname: res}
def tnames2list(table):
return [i[0] for i in table[2:]]
def constraints2dict(table):
return {i[0]: i[1] for i in table[2:]}
def units2dict(table):
return {i[0]: i[1] for i in table[2:]}
def extract_tables(data):
block = False
res = []
table = []
......@@ -99,13 +122,7 @@ def parse(fname):
if l.startswith('|'):
if not block:
block = True
ll = l.split('|')
name = ll[1][1:-1]
if not name.startswith('pfit'):
continue
tmp = _convert(name, ll[2], ll[3], ll[4], ll[5], ll[6])
if tmp is not None:
table.append(tmp)
table.append(l.split('|')[1:-1])
else:
if block:
block = False
......@@ -118,5 +135,16 @@ def parse(fname):
if __name__ == '__main__':
import sys
tables = parse(sys.argv[1])
pp(tables)
with open(sys.argv[1], 'r') as f:
data = f.readlines()
tables = extract_tables(data)
tnames = tnames2list(tables[0])
print(tnames)
constraints = constraints2dict(tables[1])
print(constraints)
units = units2dict(tables[2])
print(units)
for i in zip(tables[3:], tnames):
t = datatable2list(i[0], i[1])
pp(t)
print()
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment