From 93956f8e9c2e9a16dfae907bf0f9270fb4c592bb Mon Sep 17 00:00:00 2001
From: Konstantin Baierer <unixprog@gmail.com>
Date: Thu, 24 Oct 2019 19:36:04 +0200
Subject: [PATCH] update repos

---
 repos.json | 1136 ++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 882 insertions(+), 254 deletions(-)

diff --git a/repos.json b/repos.json
index 0c0d7cb..dd67cfb 100644
--- a/repos.json
+++ b/repos.json
@@ -2,13 +2,128 @@
     {
         "files": {
             "Dockerfile": null,
-            "README.md": "# ocrd_calamari\n\nRecognize text using [Calamari OCR](https://github.com/Calamari-OCR/calamari).\n\n## Introduction\n\nThis offers a OCR-D compliant workspace processor for some of the functionality of Calamari OCR.\n\nThis processor only operates on the text line level and so needs a line segmentation (and by extension a binarized \nimage) as its input.\n\n## Installation\n\n### From PyPI\n\n:construction: :construction: :construction: :construction: :construction: :construction: :construction:\n\n```\npip install ocrd_calamari\n```\n\n### From Repo\n\n```sh\npip install .\n```\n\nTo install the calamari with the GPU version of Tensorflow:\n\n```sh\npip install 'calamari-ocr[tf_cpu]'\npip install .\n```\n\n## Example Usage\n\n~~~\nocrd-calamari-recognize -p test-parameters.json -m mets.xml -I OCR-D-SEG-LINE -O OCR-D-OCR-CALAMARI\n~~~\n\nWith `test-parameters.json`:\n~~~\n{\n    \"checkpoint\": \"/path/to/some/trained/models/*.ckpt.json\"\n}\n~~~\n\nTODO\n----\n\n* Support Calamari's \"extended prediction data\" output\n* Currently, the processor only supports a prediction using confidence voting of multiple models. While this is\n  superior, it makes sense to support single model prediction, too.\n",
+            "README.md": "# cor-asv-ann\n    OCR post-correction with encoder-attention-decoder LSTMs\n\n## Introduction\n\nThis is a tool for automatic OCR _post-correction_ (reducing optical character recognition errors) with recurrent neural networks. It uses sequence-to-sequence transduction on the _character level_ with a model architecture akin to neural machine translation, i.e. a stacked **encoder-decoder** network with attention mechanism. \n\nThe **attention model** always applies to full lines (in a _global_ configuration), and uses a linear _additive_ alignment model. (This transfers information between the encoder and decoder hidden layer states, and calculates a _soft alignment_ between input and output characters. It is imperative for character-level processing, because with a simple final-initial transfer, models tend to start \"forgetting\" the input altogether at some point in the line and behave like unconditional LM generators.)\n\n...FIXME: mention: \n- stacked architecture (with bidirectional bottom and attentional top), configurable depth/width\n- weight tying\n- underspecification and gap\n- confidence input and alternative input\n- CPU/GPU option\n- incremental training, LM transfer, shallow transfer\n- evaluation (CER, PPL)\n\n### Processing PAGE annotations\n\nWhen applied on PAGE-XML (as OCR-D workspace processor), this component also allows processing below the `TextLine` hierarchy level, i.e. on `Word` or `Glyph` level. For that it uses the soft alignment scores to calculate an optimal hard alignment path for characters, and thereby distributes the transduction onto the lower level elements (keeping their coordinates and other meta-data), while changing Word segmentation if necessary.\n\n...\n\n### Architecture\n\n...FIXME: show!\n\n### Input with confidence and/or alternatives\n\n...FIXME: explain!\n\n### Multi-OCR input\n\nnot yet!\n\n### Modes\n\nWhile the _encoder_ can always be run in parallel over a batch of lines and by passing the full sequence of characters in one tensor (padded to the longest line in the batch), which is very efficient with Keras backends like Tensorflow, a **beam-search** _decoder_ requires passing initial/final states character-by-character, with parallelism employed to capture multiple history hypotheses of a single line. However, one can also **greedily** use the best output only for each position (without beam search). And in doing so, another option is to feed back the softmax output directly into the decoder input instead of its argmax unit vector. This effectively passes the full probability distribution from state to state, which (not very surprisingly) can increase correction accuracy quite a lot \u2013 it can get as good as a medium-sized beam search results. This latter option also allows to run in parallel again, which is also much faster \u2013 consuming up to ten times less CPU time.\n\nThererfore, the backend function `lib.Sequence2Sequence.correct_lines` can operate the encoder-decoder network in either of the following modes:\n\n#### _fast_\n\nDecode greedily, but feeding back the full softmax distribution in batch mode.\n\n#### _greedy_\n\nDecode greedily, but feeding back the argmax unit vectors for each line separately.\n\n#### _default_\n\nDecode beamed, feeding back the argmax unit vectors for the best history/output hypotheses of each line. More specifically:\n\n> Start decoder with start-of-sequence, then keep decoding until\n> end-of-sequence is found or output length is way off, repeatedly.\n> Decode by using the best predicted output characters and several next-best\n> alternatives (up to some degradation threshold) as next input.\n> Follow-up on the N best overall candidates (estimated by accumulated\n> score, normalized by length and prospective cost), i.e. do A*-like\n> breadth-first search, with N equal `batch_size`.\n> Pass decoder initial/final states from character to character,\n> for each candidate respectively.\n> Reserve 1 candidate per iteration for running through `source_seq`\n> (as a rejection fallback) to ensure that path does not fall off the\n> beam and at least one solution can be found within the search limits.\n\n### Evaluation\n\nText lines can be compared (by aligning and computing a distance under some metric) across multiple inputs. (This would typically be GT and OCR vs post-correction.) This can be done both on plain text files (`cor-asv-ann-eval`) and PAGE-XML annotations (`ocrd-cor-asv-ann-evaluate`).\n\nThere are a number of distance metrics available:\n- `Levenshtein`: simple unweighted edit distance (fastest, standard)\n- `combining-e-umlauts`: like the former, but umlauts with combining letter e get smaller distance to precomposed umlauts (and vice versa), as in \"Wu\u0364\u017fte\" (as opposed to \"W\u00fc\u017fte\")\n- `historic_latin`: like the former, but with additional exceptions (i.e. zero distances) for certain (isolated) character confusions \u2013 roughly the difference between GT level 1 and 2\n- `NFC`: like `Levenshtein`, but apply Unicode normal form with canonical composition before (i.e. less than `historic_latin`)\n- `NFKC`: like `Levenshtein`, but apply Unicode normal form with compatibility composition before (i.e. more than `historic_latin`)\n\n\n## Installation\n\nRequired Ubuntu packages:\n\n* Python (``python`` or ``python3``)\n* pip (``python-pip`` or ``python3-pip``)\n* virtualenv (``python-virtualenv`` or ``python3-virtualenv``)\n\nCreate and activate a virtualenv as usual.\n\nTo install Python dependencies and this module, then do:\n```shell\nmake deps install\n```\nWhich is the equivalent of:\n```shell\npip install -r requirements.txt\npip install -e .\n```\n\n## Usage\n\nThis packages has the following user interfaces:\n\n### command line interface `cor-asv-ann-train`\n\nTo be used with string arguments and plain-text files.\n\n...\n\n### command line interface `cor-asv-ann-eval`\n\nTo be used with string arguments and plain-text files.\n\n...\n\n### command line interface `cor-asv-ann-repl`\n\ninteractive\n\n...\n\n### [OCR-D processor](https://github.com/OCR-D/core) interface `ocrd-cor-asv-ann-process`\n\nTo be used with [PageXML](https://www.primaresearch.org/tools/PAGELibraries) documents in an [OCR-D](https://github.com/OCR-D/spec/) annotation workflow. Input could be anything with a textual annotation (`TextEquiv` on the given `textequiv_level`). \n\n...\n\n```json\n    \"ocrd-cor-asv-ann-process\": {\n      \"executable\": \"ocrd-cor-asv-ann-process\",\n      \"categories\": [\n        \"Text recognition and optimization\"\n      ],\n      \"steps\": [\n        \"recognition/post-correction\"\n      ],\n      \"description\": \"Improve text annotation by character-level encoder-attention-decoder ANN model\",\n      \"input_file_grp\": [\n        \"OCR-D-OCR-TESS\",\n        \"OCR-D-OCR-KRAK\",\n        \"OCR-D-OCR-OCRO\",\n        \"OCR-D-OCR-CALA\",\n        \"OCR-D-OCR-ANY\"\n      ],\n      \"output_file_grp\": [\n        \"OCR-D-COR-ASV\"\n      ],\n      \"parameters\": {\n        \"model_file\": {\n          \"type\": \"string\",\n          \"format\": \"uri\",\n          \"content-type\": \"application/x-hdf;subtype=bag\",\n          \"description\": \"path of h5py weight/config file for model trained with cor-asv-ann-train\",\n          \"required\": true,\n          \"cacheable\": true\n        },\n        \"textequiv_level\": {\n          \"type\": \"string\",\n          \"enum\": [\"line\", \"word\", \"glyph\"],\n          \"default\": \"glyph\",\n          \"description\": \"PAGE XML hierarchy level to read/write TextEquiv input/output on\"\n        }\n      }\n    }\n```\n\n...\n\n### [OCR-D processor](https://github.com/OCR-D/core) interface `ocrd-cor-asv-ann-evaluate`\n\nTo be used with [PageXML](https://www.primaresearch.org/tools/PAGELibraries) documents in an [OCR-D](https://github.com/OCR-D/spec/) annotation workflow. Inputs could be anything with a textual annotation (`TextEquiv` on the line level), but at least 2. The first in the list of input file groups will be regarded as reference/GT.\n\n...\n\n```json\n    \"ocrd-cor-asv-ann-evaluate\": {\n      \"executable\": \"ocrd-cor-asv-ann-evaluate\",\n      \"categories\": [\n        \"Text recognition and optimization\"\n      ],\n      \"steps\": [\n        \"recognition/evaluation\"\n      ],\n      \"description\": \"Align different textline annotations and compute distance\",\n      \"parameters\": {\n        \"metric\": {\n          \"type\": \"string\",\n          \"enum\": [\"Levenshtein\", \"combining-e-umlauts\", \"NFC\", \"NFKC\", \"historic_latin\"],\n          \"default\": \"Levenshtein\",\n          \"description\": \"Distance metric to calculate and aggregate\"\n        }\n      }\n    }\n```\n\n...\n\n## Testing\n\nnot yet!\n...\n",
+            "ocrd-tool.json": "{\n  \"git_url\": \"https://github.com/ASVLeipzig/cor-asv-ann\",\n  \"version\": \"0.1.0\",\n  \"tools\": {\n    \"ocrd-cor-asv-ann-process\": {\n      \"executable\": \"ocrd-cor-asv-ann-process\",\n      \"categories\": [\n        \"Text recognition and optimization\"\n      ],\n      \"steps\": [\n        \"recognition/post-correction\"\n      ],\n      \"description\": \"Improve text annotation by character-level encoder-attention-decoder ANN model\",\n      \"input_file_grp\": [\n        \"OCR-D-OCR-TESS\",\n        \"OCR-D-OCR-KRAK\",\n        \"OCR-D-OCR-OCRO\",\n        \"OCR-D-OCR-CALA\",\n        \"OCR-D-OCR-ANY\"\n      ],\n      \"output_file_grp\": [\n        \"OCR-D-COR-ASV\"\n      ],\n      \"parameters\": {\n        \"model_file\": {\n          \"type\": \"string\",\n          \"format\": \"uri\",\n          \"content-type\": \"application/x-hdf;subtype=bag\",\n          \"description\": \"path of h5py weight/config file for model trained with cor-asv-ann-train\",\n          \"required\": true,\n          \"cacheable\": true\n        },\n        \"textequiv_level\": {\n          \"type\": \"string\",\n          \"enum\": [\"line\", \"word\", \"glyph\"],\n          \"default\": \"glyph\",\n          \"description\": \"PAGE XML hierarchy level to read/write TextEquiv input/output on\"\n        }\n      }\n    },\n    \"ocrd-cor-asv-ann-evaluate\": {\n      \"executable\": \"ocrd-cor-asv-ann-evaluate\",\n      \"categories\": [\n        \"Text recognition and optimization\"\n      ],\n      \"steps\": [\n        \"recognition/evaluation\"\n      ],\n      \"description\": \"Align different textline annotations and compute distance\",\n      \"parameters\": {\n        \"metric\": {\n          \"type\": \"string\",\n          \"enum\": [\"Levenshtein\", \"combining-e-umlauts\", \"NFC\", \"NFKC\", \"historic_latin\"],\n          \"default\": \"Levenshtein\",\n          \"description\": \"Distance metric to calculate and aggregate\"\n        }\n      }\n    }\n  }\n}\n",
+            "setup.py": "# -*- coding: utf-8 -*-\n\"\"\"\nInstalls:\n    - cor-asv-ann-train\n    - cor-asv-ann-eval\n    - cor-asv-ann-repl\n    - ocrd-cor-asv-ann-process\n    - ocrd-cor-asv-ann-evaluate\n\"\"\"\nimport codecs\n\nfrom setuptools import setup, find_packages\n\ninstall_requires = open('requirements.txt').read().split('\\n')\n\nwith codecs.open('README.md', encoding='utf-8') as f:\n    README = f.read()\n\nsetup(\n    name='ocrd_cor_asv_ann',\n    version='0.1.1',\n    description='sequence-to-sequence translator for noisy channel error correction',\n    long_description=README,\n    author='Robert Sachunsky',\n    author_email='sachunsky@informatik.uni-leipzig.de',\n    url='https://github.com/ASVLeipzig/cor-asv-ann',\n    license='Apache License 2.0',\n    packages=find_packages(exclude=('tests', 'docs')),\n    install_requires=install_requires,\n    package_data={\n        '': ['*.json', '*.yml', '*.yaml'],\n    },\n    entry_points={\n        'console_scripts': [\n            'cor-asv-ann-train=ocrd_cor_asv_ann.scripts.train:cli',\n            'cor-asv-ann-eval=ocrd_cor_asv_ann.scripts.eval:cli',\n            'cor-asv-ann-repl=ocrd_cor_asv_ann.scripts.repl:cli',\n            'ocrd-cor-asv-ann-process=ocrd_cor_asv_ann.wrapper.cli:ocrd_cor_asv_ann_process',\n            'ocrd-cor-asv-ann-evaluate=ocrd_cor_asv_ann.wrapper.cli:ocrd_cor_asv_ann_evaluate',\n        ]\n    },\n)\n"
+        },
+        "git": {
+            "last_commit": "Tue Oct 22 11:23:54 2019 +0200",
+            "number_of_commits": "37"
+        },
+        "name": "cor-asv-ann",
+        "ocrd_tool": {
+            "git_url": "https://github.com/ASVLeipzig/cor-asv-ann",
+            "tools": {
+                "ocrd-cor-asv-ann-evaluate": {
+                    "categories": [
+                        "Text recognition and optimization"
+                    ],
+                    "description": "Align different textline annotations and compute distance",
+                    "executable": "ocrd-cor-asv-ann-evaluate",
+                    "parameters": {
+                        "metric": {
+                            "default": "Levenshtein",
+                            "description": "Distance metric to calculate and aggregate",
+                            "enum": [
+                                "Levenshtein",
+                                "combining-e-umlauts",
+                                "NFC",
+                                "NFKC",
+                                "historic_latin"
+                            ],
+                            "type": "string"
+                        }
+                    },
+                    "steps": [
+                        "recognition/evaluation"
+                    ]
+                },
+                "ocrd-cor-asv-ann-process": {
+                    "categories": [
+                        "Text recognition and optimization"
+                    ],
+                    "description": "Improve text annotation by character-level encoder-attention-decoder ANN model",
+                    "executable": "ocrd-cor-asv-ann-process",
+                    "input_file_grp": [
+                        "OCR-D-OCR-TESS",
+                        "OCR-D-OCR-KRAK",
+                        "OCR-D-OCR-OCRO",
+                        "OCR-D-OCR-CALA",
+                        "OCR-D-OCR-ANY"
+                    ],
+                    "output_file_grp": [
+                        "OCR-D-COR-ASV"
+                    ],
+                    "parameters": {
+                        "model_file": {
+                            "cacheable": true,
+                            "content-type": "application/x-hdf;subtype=bag",
+                            "description": "path of h5py weight/config file for model trained with cor-asv-ann-train",
+                            "format": "uri",
+                            "required": true,
+                            "type": "string"
+                        },
+                        "textequiv_level": {
+                            "default": "glyph",
+                            "description": "PAGE XML hierarchy level to read/write TextEquiv input/output on",
+                            "enum": [
+                                "line",
+                                "word",
+                                "glyph"
+                            ],
+                            "type": "string"
+                        }
+                    },
+                    "steps": [
+                        "recognition/post-correction"
+                    ]
+                }
+            },
+            "version": "0.1.0"
+        },
+        "ocrd_tool_validate": "<report valid=\"false\">\n  <error>[tools.ocrd-cor-asv-ann-evaluate] 'input_file_grp' is a required property</error>\n  <error>[tools.ocrd-cor-asv-ann-evaluate.steps.0] 'recognition/evaluation' is not one of ['preprocessing/characterization', 'preprocessing/optimization', 'preprocessing/optimization/cropping', 'preprocessing/optimization/deskewing', 'preprocessing/optimization/despeckling', 'preprocessing/optimization/dewarping', 'preprocessing/optimization/binarization', 'preprocessing/optimization/grayscale_normalization', 'recognition/text-recognition', 'recognition/font-identification', 'recognition/post-correction', 'layout/segmentation', 'layout/segmentation/text-nontext', 'layout/segmentation/region', 'layout/segmentation/line', 'layout/segmentation/word', 'layout/segmentation/classification', 'layout/analysis']</error>\n</report>",
+        "org_plus_name": "ASVLeipzig/cor-asv-ann",
+        "python": {
+            "author": "Robert Sachunsky",
+            "author-email": "sachunsky@informatik.uni-leipzig.de",
+            "name": "ocrd_cor_asv_ann",
+            "url": "https://github.com/ASVLeipzig/cor-asv-ann"
+        },
+        "url": "https://github.com/ASVLeipzig/cor-asv-ann"
+    },
+    {
+        "files": {
+            "Dockerfile": null,
+            "README.md": "# cor-asv-fst\n    OCR post-correction with error/lexicon Finite State Transducers and\n    chararacter-level LSTM language models\n\n## Introduction\n\n\n## Installation\n\nRequired Ubuntu packages:\n\n* Python (``python`` or ``python3``)\n* pip (``python-pip`` or ``python3-pip``)\n* virtualenv (``python-virtualenv`` or ``python3-virtualenv``)\n\nCreate and activate a virtualenv as usual.\n\nTo install Python dependencies and this module, then do:\n```shell\nmake deps install\n```\nWhich is the equivalent of:\n```shell\npip install -r requirements.txt\npip install -e .\n```\n\nIn addition to the requirements listed in `requirements.txt`, the tool\nrequires the\n[pynini](http://www.opengrm.org/twiki/bin/view/GRM/Pynini)\nlibrary, which has to be installed from source.\n\n## Usage\n\nThe package has two user interfaces:\n\n### Command Line Interface\n\nThe package contains a suite of CLI tools to work with plaintext data (prefix:\n`cor-asv-fst-*`). The minimal working examples and data formats are described\nbelow. Additionally, each tool has further optional parameters - for a detailed\ndescription, call the tool with the `--help` option.\n\n#### `cor-asv-fst-train`\n\nTrain FST models. The basic invocation is as follows:\n\n```shell\ncor-asv-fst-train -l LEXICON_FILE -e ERROR_MODEL_FILE -t TRAINING_FILE\n```\n\nThis will create two transducers, which will be stored in `LEXICON_FILE` and\n`ERROR_MODEL_FILE`, respectively. As the training of the lexicon and the error\nmodel is done independently, any of them can be skipped by omitting the\nrespective parameter.\n\n`TRAINING_FILE` is a plain text file in tab-separated, two-column format\ncontaining a line of OCR-output and the corresponding ground truth line:\n\n```\n\u00bb Bergebt mir, da\u00df ih niht wei\u00df, wie\t\u00bbVergebt mir, da\u00df ich nicht wei\u00df, wie\naus dem (Gei\u017fte aller Nationen Mahrunq\taus dem Gei\u017fte aller Nationen Nahrung\nKann\u017ft Du mir die re<h\u00e9e Bahn nich\u00e9 zeigen ?\tKann\u017ft Du mir die rechte Bahn nicht zeigen?\nfrag zu bringen. \u2014\ttrag zu bringen. \u2014\n\u017fie ins irdij<he Leben hinein, Mit leichtem,\t\u017fie ins irdi\u017fche Leben hinein. Mit leichtem,\n```\n\nEach line is treated independently. Alternatively to the above, the training\ndata may also be supplied as two files:\n\n```shell\ncor-asv-fst-train -l LEXICON_FILE -e ERROR_MODEL_FILE -i INPUT_FILE -g GT_FILE\n```\n\nIn this variant, `INPUT_FILE` and `GT_FILE` are both in tab-separated,\ntwo-column format, in which the first column is the line ID and the second the\nline:\n\n```\n>=== INPUT_FILE ===<\nalexis_ruhe01_1852_0018_022     ih denke. Aber was die \u017felige Frau Geheimr\u00e4th1n\nalexis_ruhe01_1852_0035_019     \u201eDas fann ich niht, c\u2019esl absolument impos-\nalexis_ruhe01_1852_0087_027     rend. In dem Augenbli> war 1hr niht wohl zu\nalexis_ruhe01_1852_0099_012     \u00fcr die fle \u017fich \u017fchlugen.\u201c\nalexis_ruhe01_1852_0147_009     \u017follte. Nur \u00dcber die Familien, wo man \u017fie einf\u00fchren\n\n>=== GT_FILE ===<\nalexis_ruhe01_1852_0018_022     ich denke. Aber was die \u017felige Frau Geheimr\u00e4thin\nalexis_ruhe01_1852_0035_019     \u201eDas kann ich nicht, c'est absolument impos\u2014\nalexis_ruhe01_1852_0087_027     rend. Jn dem Augenblick war ihr nicht wohl zu\nalexis_ruhe01_1852_0099_012     f\u00fcr die \u017fie \u017fich \u017fchlugen.\u201c\nalexis_ruhe01_1852_0147_009     \u017follte. Nur \u00fcber die Familien, wo man \u017fie einf\u00fchren\n```\n\n#### `cor-asv-fst-process`\n\nThis tool applies a trained model to correct plaintext data on a line basis.\nThe basic invocation is:\n\n```shell\ncor-asv-fst-process -i INPUT_FILE -o OUTPUT_FILE -l LEXICON_FILE -e ERROR_MODEL_FILE (-m LM_FILE)\n```\n\n`INPUT_FILE` is in the same format as for the training procedure. `OUTPUT_FILE`\ncontains the post-correction results in the same format.\n\n`LM_FILE` is a `ocrd_keraslm` language model - if supplied, it is used for\nrescoring.\n\n#### `cor-asv-fst-evaluate`\n\nThis tool can be used to evaluate the post-correction results. The minimal\nworking invocation is:\n\n```shell\ncor-asv-fst-evaluate -i INPUT_FILE -o OUTPUT_FILE -g GT_FILE\n```\n\nAdditionally, the parameter `-M` can be used to select the evaluation measure\n(`Levenshtein` by default). The files should be in the same two-column format\nas described above.\n\n### [OCR-D processor](https://github.com/OCR-D/core) interface `ocrd-cor-asv-fst-process`\n\nTo be used with [PageXML](https://www.primaresearch.org/tools/PAGELibraries)\ndocuments in an [OCR-D](https://github.com/OCR-D/spec/) annotation workflow.\nInput could be anything with a textual annotation (`TextEquiv` on the given\n`textequiv_level`).\n\n...\n\n```json\n  \"tools\": {\n    \"cor-asv-fst-process\": {\n      \"executable\": \"cor-asv-fst-process\",\n      \"categories\": [\n        \"Text recognition and optimization\"\n      ],\n      \"steps\": [\n        \"recognition/post-correction\"\n      ],\n      \"description\": \"Improve text annotation by FST error and lexicon model with character-level LSTM language model\",\n      \"input_file_grp\": [\n        \"OCR-D-OCR-TESS\",\n        \"OCR-D-OCR-KRAK\",\n        \"OCR-D-OCR-OCRO\",\n        \"OCR-D-OCR-CALA\",\n        \"OCR-D-OCR-ANY\"\n      ],\n      \"output_file_grp\": [\n        \"OCR-D-COR-ASV\"\n      ],\n      \"parameters\": {\n        \"keraslm_file\": {\n          \"type\": \"string\",\n          \"format\": \"uri\",\n          \"content-type\": \"application/x-hdf;subtype=bag\",\n          \"description\": \"path of h5py weight/config file for language model trained with keraslm\",\n          \"required\": true,\n          \"cacheable\": true\n        },\n        \"errorfst_file\": {\n          \"type\": \"string\",\n          \"format\": \"uri\",\n          \"content-type\": \"application/vnd.openfst\",\n          \"description\": \"path of FST file for error model\",\n          \"required\": true,\n          \"cacheable\": true\n        },\n        \"lexiconfst_file\": {\n          \"type\": \"string\",\n          \"format\": \"uri\",\n          \"content-type\": \"application/vnd.openfst\",\n          \"description\": \"path of FST file for lexicon model\",\n          \"required\": true,\n          \"cacheable\": true\n        },\n        \"textequiv_level\": {\n          \"type\": \"string\",\n          \"enum\": [\"word\", \"glyph\"],\n          \"default\": \"glyph\",\n          \"description\": \"PAGE XML hierarchy level to read TextEquiv input on (output will always be word level)\"\n        },\n        \"beam_width\": {\n          \"type\": \"number\",\n          \"format\": \"integer\",\n          \"description\": \"maximum number of best partial paths to consider during beam search in language modelling\",\n          \"default\": 100\n        },\n        \"lm_weight\": {\n          \"type\": \"number\",\n          \"format\": \"float\",\n          \"description\": \"share of the LM scores over the FST output confidences\",\n          \"default\": 0.5\n        }\n      }\n    }\n  }\n```\n\n...\n\n## Testing\n\n...\n",
+            "ocrd-tool.json": null,
+            "setup.py": "# -*- coding: utf-8 -*-\n\"\"\"\nInstalls:\n    - cor-asv-fst-train\n    - cor-asv-fst-process\n    - cor-asv-fst-evaluate\n    - ocrd-cor-asv-fst-process\n\"\"\"\nimport codecs\n\nfrom setuptools import setup, find_packages\n\ninstall_requires = open('requirements.txt').read().split('\\n')\n\nwith codecs.open('README.md', encoding='utf-8') as f:\n    README = f.read()\n\nsetup(\n    name='ocrd_cor_asv_fst',\n    version='0.2.0',\n    description='OCR post-correction with error/lexicon Finite State '\n                'Transducers and character-level LSTMs',\n    long_description=README,\n    author='Maciej Sumalvico, Robert Sachunsky',\n    author_email='sumalvico@informatik.uni-leipzig.de, '\n                 'sachunsky@informatik.uni-leipzig.de',\n    url='https://github.com/ASVLeipzig/cor-asv-fst',\n    license='Apache License 2.0',\n    packages=find_packages(exclude=('tests', 'docs')),\n    install_requires=install_requires,\n    package_data={\n        '': ['*.json', '*.yml', '*.yaml'],\n    },\n    test_suite='tests',\n    entry_points={\n        'console_scripts': [\n            'cor-asv-fst-train=ocrd_cor_asv_fst.scripts.train:main',\n            'cor-asv-fst-process=ocrd_cor_asv_fst.scripts.process:main',\n            'cor-asv-fst-evaluate=ocrd_cor_asv_fst.scripts.evaluate:main',\n            'ocrd-cor-asv-fst-process=ocrd_cor_asv_fst.wrapper.cli:ocrd_cor_asv_fst',\n        ]\n    }\n)\n"
+        },
+        "git": {
+            "last_commit": "Tue Jul 23 17:00:16 2019 +0200",
+            "number_of_commits": "172"
+        },
+        "name": "cor-asv-fst",
+        "ocrd_tool": "",
+        "ocrd_tool_validate": "NO ocrd-tool.json",
+        "org_plus_name": "ASVLeipzig/cor-asv-fst",
+        "python": {
+            "author": "Maciej Sumalvico, Robert Sachunsky",
+            "author-email": "sumalvico@informatik.uni-leipzig.de, sachunsky@informatik.uni-leipzig.de",
+            "name": "ocrd_cor_asv_fst",
+            "url": "https://github.com/ASVLeipzig/cor-asv-fst"
+        },
+        "url": "https://github.com/ASVLeipzig/cor-asv-fst"
+    },
+    {
+        "files": {
+            "Dockerfile": null,
+            "README.md": "# ocrd_calamari\n\nRecognize text using [Calamari OCR](https://github.com/Calamari-OCR/calamari).\n\nIntroduction\n-------------\n\nThis offers a OCR-D compliant workspace processor for some of the functionality of Calamari OCR.\n\nThis processor only operates on the text line level and so needs a line segmentation (and by extension a binarized \nimage) as its input.\n\nExample Usage\n-------------\n\n~~~\nocrd-calamari-recognize -p test-parameters.json -m mets.xml -I OCR-D-SEG-LINE -O OCR-D-OCR-CALAMARI\n~~~\n\nWith `test-parameters.json`:\n~~~\n{\n    \"checkpoint\": \"/path/to/some/trained/models/*.ckpt.json\"\n}\n~~~\n\nTODO\n----\n\n* Support Calamari's \"extended prediction data\" output\n* Currently, the processor only supports a prediction using confidence voting of multiple models. While this is\n  superior, it makes sense to support single model prediction, too.\n",
             "ocrd-tool.json": "{\n  \"git_url\": \"https://github.com/kba/ocrd_calamari\",\n  \"version\": \"0.0.1\",\n  \"tools\": {\n    \"ocrd-calamari-recognize\": {\n      \"executable\": \"ocrd-calamari-recognize\",\n      \"categories\": [\n        \"Text recognition and optimization\"\n      ],\n      \"steps\": [\n        \"recognition/text-recognition\"\n      ],\n      \"description\": \"Recognize lines with Calamari\",\n      \"input_file_grp\": [\n        \"OCR-D-SEG-LINE\"\n      ],\n      \"output_file_grp\": [\n        \"OCR-D-OCR-CALAMARI\"\n      ],\n      \"parameters\": {\n        \"checkpoint\": {\"type\": \"string\", \"format\": \"file\", \"cacheable\": true},\n        \"voter\": {\"type\": \"string\", \"default\": \"confidence_voter_default_ctc\"}\n      }\n    }\n  }\n}\n",
             "setup.py": "# -*- coding: utf-8 -*-\nimport codecs\n\nfrom setuptools import setup, find_packages\n\nsetup(\n    name='ocrd_calamari',\n    version='0.0.1',\n    description='Calamari bindings',\n    long_description=codecs.open('README.md', encoding='utf-8').read(),\n    author='Konstantin Baierer, Mike Gerber',\n    author_email='unixprog@gmail.com, mike.gerber@sbb.spk-berlin.de',\n    url='https://github.com/kba/ocrd_calamari',\n    license='Apache License 2.0',\n    packages=find_packages(exclude=('tests', 'docs')),\n    install_requires=open('requirements.txt').read().split('\\n'),\n    package_data={\n        '': ['*.json', '*.yml', '*.yaml'],\n    },\n    entry_points={\n        'console_scripts': [\n            'ocrd-calamari-recognize=ocrd_calamari.cli:ocrd_calamari_recognize',\n        ]\n    },\n)\n"
         },
         "git": {
-            "last_commit": "Tue Oct 22 11:00:05 2019 +0200",
-            "number_of_commits": "26"
+            "last_commit": "Fri Sep 27 15:52:03 2019 +0200",
+            "number_of_commits": "24"
         },
         "name": "ocrd_calamari",
         "ocrd_tool": {
@@ -62,8 +177,8 @@
             "setup.py": null
         },
         "git": {
-            "last_commit": "Wed Jul 18 18:17:27 2018 +0200",
-            "number_of_commits": "6"
+            "last_commit": "Tue Jun 26 18:30:04 2018 +0200",
+            "number_of_commits": "5"
         },
         "name": "ocrd_im6convert",
         "ocrd_tool": {
@@ -93,7 +208,7 @@
             },
             "version": "0.0.1"
         },
-        "ocrd_tool_validate": "<report valid=\"false\">\n  <error>[tools.ocrd-im6convert] 'input_file_grp' is a required property</error>\n  <error>[tools.ocrd-im6convert] 'output_file_grp' is a required property</error>\n  <error>[tools.ocrd-im6convert.parameters.output-format] 'description' is a required property</error>\n</report>",
+        "ocrd_tool_validate": "<report valid=\"false\">\n  <error>[tools.ocrd-im6convert] 'input_file_grp' is a required property</error>\n  <error>[tools.ocrd-im6convert.parameters.output-format] 'description' is a required property</error>\n</report>",
         "org_plus_name": "OCR-D/ocrd_im6convert",
         "python": {
             "author": "",
@@ -111,7 +226,7 @@
             "setup.py": "# -*- coding: utf-8 -*-\n\"\"\"\nInstalls:\n    - keraslm-rate\n    - ocrd-keraslm-rate\n\"\"\"\nimport codecs\n\nfrom setuptools import setup, find_packages\n\nwith codecs.open('README.md', encoding='utf-8') as f:\n    README = f.read()\n\nsetup(\n    name='ocrd_keraslm',\n    version='0.3.1',\n    description='character-level language modelling in Keras',\n    long_description=README,\n    author='Konstantin Baierer, Kay-Michael W\u00fcrzner',\n    author_email='unixprog@gmail.com, wuerzner@gmail.com',\n    url='https://github.com/OCR-D/ocrd_keraslm',\n    license='Apache License 2.0',\n    packages=find_packages(exclude=('tests', 'docs')),\n    install_requires=open('requirements.txt').read().split('\\n'),\n    extras_require={\n        'plotting': [\n            'sklearn',\n            'matplotlib',\n            ]\n    },\n    package_data={\n        '': ['*.json', '*.yml', '*.yaml'],\n    },\n    entry_points={\n        'console_scripts': [\n            'keraslm-rate=ocrd_keraslm.scripts.run:cli',\n            'ocrd-keraslm-rate=ocrd_keraslm.wrapper.cli:ocrd_keraslm_rate',\n        ]\n    },\n)\n"
         },
         "git": {
-            "last_commit": "Tue Oct 22 10:57:38 2019 +0200",
+            "last_commit": "Tue Oct 22 11:25:28 2019 +0200",
             "number_of_commits": "81"
         },
         "name": "ocrd_keraslm",
@@ -181,7 +296,7 @@
             },
             "version": "0.3.1"
         },
-        "ocrd_tool_validate": "<report valid=\"false\">\n  <error>[tools.ocrd-keraslm-rate.parameters.model_file.content-type] 'application/x-hdf;subtype=bag' does not match '^[a-z0-9\\\\._-]+/[A-Za-z0-9\\\\._\\\\+-]+$'</error>\n</report>",
+        "ocrd_tool_validate": "<report valid=\"true\">\n</report>",
         "org_plus_name": "OCR-D/ocrd_keraslm",
         "python": {
             "author": "Konstantin Baierer, Kay-Michael W\u00fcrzner",
@@ -296,7 +411,7 @@
             },
             "version": "0.0.2"
         },
-        "ocrd_tool_validate": "<report valid=\"false\">\n  <error>[tools.ocrd-kraken-binarize.input_file_grp] 'OCR-D-IMG' is not of type 'array'</error>\n  <error>[tools.ocrd-kraken-binarize.output_file_grp] 'OCR-D-IMG-BIN' is not of type 'array'</error>\n  <error>[tools.ocrd-kraken-binarize.parameters.level-of-operation] 'description' is a required property</error>\n  <error>[tools.ocrd-kraken-segment] 'input_file_grp' is a required property</error>\n  <error>[tools.ocrd-kraken-segment] 'output_file_grp' is a required property</error>\n  <error>[tools.ocrd-kraken-segment.parameters.maxcolseps] 'description' is a required property</error>\n  <error>[tools.ocrd-kraken-segment.parameters.scale] 'description' is a required property</error>\n  <error>[tools.ocrd-kraken-segment.parameters.black_colseps] 'description' is a required property</error>\n  <error>[tools.ocrd-kraken-segment.parameters.white_colseps] 'description' is a required property</error>\n  <error>[tools.ocrd-kraken-ocr] 'input_file_grp' is a required property</error>\n  <error>[tools.ocrd-kraken-ocr] 'output_file_grp' is a required property</error>\n  <error>[tools.ocrd-kraken-ocr.parameters.lines-json.required] 'true' is not of type 'boolean'</error>\n</report>",
+        "ocrd_tool_validate": "<report valid=\"false\">\n  <error>[tools.ocrd-kraken-binarize.input_file_grp] 'OCR-D-IMG' is not of type 'array'</error>\n  <error>[tools.ocrd-kraken-binarize.output_file_grp] 'OCR-D-IMG-BIN' is not of type 'array'</error>\n  <error>[tools.ocrd-kraken-binarize.parameters.level-of-operation] 'description' is a required property</error>\n  <error>[tools.ocrd-kraken-segment] 'input_file_grp' is a required property</error>\n  <error>[tools.ocrd-kraken-segment.parameters.maxcolseps] 'description' is a required property</error>\n  <error>[tools.ocrd-kraken-segment.parameters.scale] 'description' is a required property</error>\n  <error>[tools.ocrd-kraken-segment.parameters.black_colseps] 'description' is a required property</error>\n  <error>[tools.ocrd-kraken-segment.parameters.white_colseps] 'description' is a required property</error>\n  <error>[tools.ocrd-kraken-ocr] 'input_file_grp' is a required property</error>\n  <error>[tools.ocrd-kraken-ocr.parameters.lines-json.required] 'true' is not of type 'boolean'</error>\n</report>",
         "org_plus_name": "OCR-D/ocrd_kraken",
         "python": {
             "author": "Konstantin Baierer, Kay-Michael W\u00fcrzner",
@@ -308,14 +423,14 @@
     },
     {
         "files": {
-            "Dockerfile": "FROM ocrd/core\nMAINTAINER OCR-D\nENV DEBIAN_FRONTEND noninteractive\nENV PYTHONIOENCODING utf8\nENV LC_ALL C.UTF-8\nENV LANG C.UTF-8\n\nWORKDIR /build-ocrd\nCOPY setup.py .\nCOPY requirements.txt .\nCOPY README.rst .\nRUN apt-get update && \\\n    apt-get -y install --no-install-recommends \\\n    ca-certificates \\\n    make \\\n    git\nCOPY ocrd_ocropy ./ocrd_ocropy\nRUN pip3 install --upgrade pip\nRUN make deps-pip install\n\nENTRYPOINT [\"/bin/sh\", \"-c\"]\n",
+            "Dockerfile": "FROM ocrd/core\nMAINTAINER OCR-D\nENV DEBIAN_FRONTEND noninteractive\nENV PYTHONIOENCODING utf8\nENV LC_ALL C.UTF-8\nENV LANG C.UTF-8\n\nWORKDIR /build-ocrd\nCOPY setup.py .\nCOPY requirements.txt .\nCOPY README.md .\nRUN apt-get update && \\\n    apt-get -y install --no-install-recommends \\\n    ca-certificates \\\n    make \\\n    git\nCOPY ocrd_ocropy ./ocrd_ocropy\nRUN pip3 install --upgrade pip\nRUN make deps install\n\nENTRYPOINT [\"/bin/sh\", \"-c\"]\n",
             "README.md": "# ocrd_ocropy\n\n[![image](https://travis-ci.org/OCR-D/ocrd_ocropy.svg?branch=master)](https://travis-ci.org/OCR-D/ocrd_ocropy)\n\n[![Docker Automated build](https://img.shields.io/docker/automated/ocrd/ocropy.svg)](https://hub.docker.com/r/ocrd/ocropy/tags/)\n\n> Wrapper for the ocropy OCR engine\n",
             "ocrd-tool.json": "{\n  \"version\": \"0.0.1\",\n  \"git_url\": \"https://github.com/OCR-D/ocrd_ocropy\",\n  \"tools\": {\n    \"ocrd-ocropy-segment\": {\n      \"executable\": \"ocrd-ocropy-segment\",\n      \"categories\": [\"Image preprocessing\"],\n      \"steps\": [\"layout/segmentation/region\"],\n      \"description\": \"Segment page\",\n      \"input_file_grp\": [\"OCR-D-IMG-BIN\"],\n      \"output_file_grp\": [\"OCR-D-SEG-LINE\"],\n      \"parameters\": {\n        \"maxcolseps\":  {\"type\": \"number\", \"description\": \"has an effect\", \"default\": 3},\n        \"maxseps\":     {\"type\": \"number\", \"description\": \"has an effect\", \"default\": 0},\n        \"sepwiden\":    {\"type\": \"number\", \"description\": \"has an effect\", \"default\": 10},\n        \"csminheight\": {\"type\": \"number\", \"description\": \"has an effect\", \"default\": 10},\n        \"csminaspect\": {\"type\": \"number\", \"description\": \"has an effect\", \"default\": 1.1},\n        \"pad\":         {\"type\": \"number\", \"description\": \"has an effect\", \"default\": 3},\n        \"expand\":      {\"type\": \"number\", \"description\": \"has an effect\", \"default\": 3},\n        \"usegauss\":    {\"type\": \"boolean\",\"description\": \"has an effect\", \"default\": false},\n        \"threshold\":   {\"type\": \"number\", \"description\": \"has an effect\", \"default\": 0.2},\n        \"noise\":       {\"type\": \"number\", \"description\": \"has an effect\", \"default\": 8},\n        \"scale\":       {\"type\": \"number\", \"description\": \"has an effect\", \"default\": 0.0},\n        \"hscale\":      {\"type\": \"number\", \"description\": \"has an effect\", \"default\": 1.0},\n        \"vscale\":      {\"type\": \"number\", \"description\": \"has an effect\", \"default\": 1.0}\n      }\n    }\n  }\n}\n",
-            "setup.py": "# -*- coding: utf-8 -*-\n\"\"\"\nInstalls one binary:\n\n    - ocrd-ocropy-segment\n\"\"\"\nimport codecs\n\nfrom setuptools import setup\n\nsetup(\n    name='ocrd_ocropy',\n    version='0.0.1a1',\n    description='ocropy bindings',\n    long_description=codecs.open('README.md', encoding='utf-8').read(),\n    long_description_content_type='text/markdown',\n    author='Konstantin Baierer',\n    author_email='unixprog@gmail.com, wuerzner@gmail.com',\n    url='https://github.com/OCR-D/ocrd_ocropy',\n    license='Apache License 2.0',\n    packages=['ocrd_ocropy'],\n    install_requires=[\n        'ocrd >= 1.0.0b6',\n        'ocrd-fork-ocropy >= 1.4.0a3',\n        'click'\n    ],\n    package_data={\n        '': ['*.json', '*.yml', '*.yaml'],\n    },\n    entry_points={\n        'console_scripts': [\n            'ocrd-ocropy-segment=ocrd_ocropy.cli:ocrd_ocropy_segment',\n        ]\n    },\n)\n"
+            "setup.py": "# -*- coding: utf-8 -*-\n\"\"\"\nInstalls one binary:\n\n    - ocrd-ocropy-segment\n\"\"\"\nimport codecs\n\nfrom setuptools import setup\n\nsetup(\n    name='ocrd_ocropy',\n    version='0.0.3',\n    description='ocropy bindings',\n    long_description=codecs.open('README.md', encoding='utf-8').read(),\n    long_description_content_type='text/markdown',\n    author='Konstantin Baierer',\n    author_email='unixprog@gmail.com, wuerzner@gmail.com',\n    url='https://github.com/OCR-D/ocrd_ocropy',\n    license='Apache License 2.0',\n    packages=['ocrd_ocropy'],\n    install_requires=[\n        'ocrd >= 1.0.0b8',\n        'ocrd-fork-ocropy >= 1.4.0a3',\n        'click'\n    ],\n    package_data={\n        '': ['*.json', '*.yml', '*.yaml'],\n    },\n    entry_points={\n        'console_scripts': [\n            'ocrd-ocropy-segment=ocrd_ocropy.cli:ocrd_ocropy_segment',\n        ]\n    },\n)\n"
         },
         "git": {
-            "last_commit": "Thu Feb 28 16:30:26 2019 +0100",
-            "number_of_commits": "60"
+            "last_commit": "Tue Jun 11 14:51:00 2019 +0200",
+            "number_of_commits": "66"
         },
         "name": "ocrd_ocropy",
         "ocrd_tool": {
@@ -425,8 +540,8 @@
             "setup.py": null
         },
         "git": {
-            "last_commit": "Mon Sep 9 22:21:57 2019 +0200",
-            "number_of_commits": "47"
+            "last_commit": "Thu Oct 24 12:18:12 2019 +0200",
+            "number_of_commits": "60"
         },
         "name": "ocrd_olena",
         "ocrd_tool": {
@@ -505,7 +620,7 @@
         },
         "git": {
             "last_commit": "Tue Sep 10 08:31:29 2019 +0200",
-            "number_of_commits": "1"
+            "number_of_commits": "28"
         },
         "name": "ocrd_segment",
         "ocrd_tool": {
@@ -558,7 +673,7 @@
             },
             "version": "0.0.1"
         },
-        "ocrd_tool_validate": "<report valid=\"false\">\n  <error>[tools.ocrd-segment-evaluate] 'output_file_grp' is a required property</error>\n</report>",
+        "ocrd_tool_validate": "<report valid=\"true\">\n</report>",
         "org_plus_name": "OCR-D/ocrd_segment",
         "python": {
             "author": "Konstantin Baierer, Kay-Michael W\u00fcrzner, Robert Sachunsky",
@@ -573,11 +688,11 @@
             "Dockerfile": "FROM ocrd/core\nMAINTAINER OCR-D\nENV DEBIAN_FRONTEND noninteractive\nENV PYTHONIOENCODING utf8\nENV LC_ALL C.UTF-8\nENV LANG C.UTF-8\n\nWORKDIR /build-ocrd\nCOPY setup.py .\nCOPY requirements.txt .\nCOPY requirements_test.txt .\nCOPY README.rst .\nCOPY LICENSE .\nRUN apt-get update && \\\n    apt-get -y install --no-install-recommends \\\n    ca-certificates \\\n    make \\\n    git\nCOPY Makefile .\nRUN make deps-ubuntu\nCOPY ocrd_tesserocr ./ocrd_tesserocr\nRUN pip3 install --upgrade pip\nRUN make PYTHON=python3 PIP=pip3 deps install\nCOPY test ./test\nRUN make PYTHON=python3 PIP=pip3 deps-test\n\nENTRYPOINT [\"/bin/sh\", \"-c\"]\n",
             "README.md": null,
             "ocrd-tool.json": "{\n  \"version\": \"0.3.0\",\n  \"git_url\": \"https://github.com/OCR-D/ocrd_tesserocr\",\n  \"dockerhub\": \"ocrd/tesserocr\",\n  \"tools\": {\n    \"ocrd-tesserocr-deskew\": {\n      \"executable\": \"ocrd-tesserocr-deskew\",\n      \"categories\": [\"Image preprocessing\"],\n      \"description\": \"Deskew pages or regions\",\n      \"input_file_grp\": [\n        \"OCR-D-IMG\",\n        \"OCR-D-SEG-BLOCK\"\n      ],\n      \"output_file_grp\": [\n        \"OCR-D-DESKEW-BLOCK\"\n      ],\n      \"steps\": [\"preprocessing/optimization/deskewing\"],\n      \"parameters\": {\n        \"operation_level\": {\n          \"type\": \"string\",\n          \"enum\": [\"page\",\"region\"],\n          \"default\": \"region\",\n          \"description\": \"PAGE XML hierarchy level to operate on\"\n        }\n      }\n    },\n    \"ocrd-tesserocr-recognize\": {\n      \"executable\": \"ocrd-tesserocr-recognize\",\n      \"categories\": [\"Text recognition and optimization\"],\n      \"description\": \"Recognize text in lines with tesseract\",\n      \"input_file_grp\": [\n        \"OCR-D-SEG-BLOCK\",\n        \"OCR-D-SEG-LINE\",\n        \"OCR-D-SEG-WORD\",\n        \"OCR-D-SEG-GLYPH\"\n      ],\n      \"output_file_grp\": [\n        \"OCR-D-OCR-TESS\"\n      ],\n      \"steps\": [\"recognition/text-recognition\"],\n      \"parameters\": {\n        \"textequiv_level\": {\n          \"type\": \"string\",\n          \"enum\": [\"region\", \"line\", \"word\", \"glyph\"],\n          \"default\": \"line\",\n          \"description\": \"PAGE XML hierarchy level to add the TextEquiv results to (requires existing layout annotation up to one level above that)\"\n        },\n        \"overwrite_words\": {\n          \"type\": \"boolean\",\n          \"default\": false,\n          \"description\": \"remove existing layout and text annotation below the TextLine level (regardless of textequiv_level)\"\n        },\n        \"model\": {\n          \"type\": \"string\",\n          \"description\": \"tessdata model to apply (an ISO 639-3 language specification or some other basename, e.g. deu-frak or Fraktur)\"\n        }\n      }\n    },\n     \"ocrd-tesserocr-segment-region\": {\n      \"executable\": \"ocrd-tesserocr-segment-region\",\n      \"categories\": [\"Layout analysis\"],\n      \"description\": \"Segment regions into lines with tesseract\",\n      \"input_file_grp\": [\n        \"OCR-D-IMG\"\n      ],\n      \"output_file_grp\": [\n        \"OCR-D-SEG-BLOCK\"\n      ],\n      \"steps\": [\"layout/segmentation/region\"],\n      \"parameters\": {\n        \"overwrite_regions\": {\n          \"type\": \"boolean\",\n          \"default\": true,\n          \"description\": \"remove existing layout and text annotation below the Page level\"\n        },\n        \"padding\": {\n          \"type\": \"number\",\n          \"format\": \"integer\",\n          \"description\": \"extend detected region rectangles by this many (true) pixels\",\n          \"default\": 8\n        },\n        \"crop_polygons\": {\n          \"type\": \"boolean\",\n          \"default\": false,\n          \"description\": \"annotate polygon coordinates instead of rectangles, and create cropped AlternativeImage masked by the polygon outlines\"\n        },\n        \"find_tables\": {\n          \"type\": \"boolean\",\n          \"default\": true,\n          \"description\": \"recognise table regions (textord_tabfind_find_tables)\"\n        }\n      }\n    },\n    \"ocrd-tesserocr-segment-line\": {\n      \"executable\": \"ocrd-tesserocr-segment-line\",\n      \"categories\": [\"Layout analysis\"],\n      \"description\": \"Segment page into regions with tesseract\",\n      \"input_file_grp\": [\n        \"OCR-D-SEG-BLOCK\"\n      ],\n      \"output_file_grp\": [\n        \"OCR-D-SEG-LINE\"\n      ],\n      \"steps\": [\"layout/segmentation/line\"],\n      \"parameters\": {\n        \"overwrite_lines\": {\n          \"type\": \"boolean\",\n          \"default\": true,\n          \"description\": \"remove existing layout and text annotation below the TextRegion level\"\n        }\n      }\n    },\n    \"ocrd-tesserocr-segment-word\": {\n      \"executable\": \"ocrd-tesserocr-segment-word\",\n      \"categories\": [\"Layout analysis\"],\n      \"description\": \"Segment lines into words with tesseract\",\n      \"input_file_grp\": [\n        \"OCR-D-SEG-LINE\"\n      ],\n      \"output_file_grp\": [\n        \"OCR-D-SEG-WORD\"\n      ],\n      \"steps\": [\"layout/segmentation/word\"],\n      \"parameters\": {\n        \"overwrite_words\": {\n          \"type\": \"boolean\",\n          \"default\": true,\n          \"description\": \"remove existing layout and text annotation below the TextLine level\"\n        }\n      }\n    },\n    \"ocrd-tesserocr-crop\": {\n      \"executable\": \"ocrd-tesserocr-crop\",\n      \"categories\": [\"Image preprocessing\"],\n      \"description\": \"Poor man's cropping with tesseract\",\n      \"input_file_grp\": [\n\t\"OCR-D-IMG\"\n      ],\n      \"output_file_grp\": [\n\t\"OCR-D-IMG-CROPPED\"\n      ],\n      \"steps\": [\"preprocessing/optimization/cropping\"],\n      \"parameters\" : {\n        \"padding\": {\n          \"type\": \"number\",\n          \"format\": \"integer\",\n          \"description\": \"extend detected border by this many (true) pixels on every side\",\n          \"default\": 4\n        }\n      }\n    },\n    \"ocrd-tesserocr-binarize\": {\n      \"executable\": \"ocrd-tesserocr-binarize\",\n      \"categories\": [\"Image preprocessing\"],\n      \"description\": \"Binarize regions or lines\",\n      \"input_file_grp\": [\n        \"OCR-D-IMG\",\n        \"OCR-D-SEG-BLOCK\",\n        \"OCR-D-SEG-LINE\"\n      ],\n      \"output_file_grp\": [\n        \"OCR-D-BIN-BLOCK\",\n        \"OCR-D-BIN-LINE\"\n      ],\n      \"steps\": [\"preprocessing/optimization/binarization\"],\n      \"parameters\": {\n        \"operation_level\": {\n          \"type\": \"string\",\n          \"enum\": [\"region\", \"line\"],\n          \"default\": \"region\",\n          \"description\": \"PAGE XML hierarchy level to operate on\"\n        }\n      }\n    }\n  }\n}\n",
-            "setup.py": "# -*- coding: utf-8 -*-\n\"\"\"\nInstalls five executables:\n\n    - ocrd_tesserocr_recognize\n    - ocrd_tesserocr_segment_region\n    - ocrd_tesserocr_segment_line\n    - ocrd_tesserocr_segment_word\n    - ocrd_tesserocr_crop\n    - ocrd_tesserocr_deskew\n    - ocrd_tesserocr_binarize\n\"\"\"\nimport codecs\n\nfrom setuptools import setup, find_packages\n\nsetup(\n    name='ocrd_tesserocr',\n    version='0.4.0',\n    description='Tesserocr bindings',\n    long_description=codecs.open('README.rst', encoding='utf-8').read(),\n    author='Konstantin Baierer, Kay-Michael W\u00fcrzner, Robert Sachunsky',\n    author_email='unixprog@gmail.com, wuerzner@gmail.com, sachunsky@informatik.uni-leipzig.de',\n    url='https://github.com/OCR-D/ocrd_tesserocr',\n    license='Apache License 2.0',\n    packages=find_packages(exclude=('tests', 'docs')),\n    install_requires=open('requirements.txt').read().split('\\n'),\n    package_data={\n        '': ['*.json', '*.yml', '*.yaml'],\n    },\n    entry_points={\n        'console_scripts': [\n            'ocrd-tesserocr-recognize=ocrd_tesserocr.cli:ocrd_tesserocr_recognize',\n            'ocrd-tesserocr-segment-region=ocrd_tesserocr.cli:ocrd_tesserocr_segment_region',\n            'ocrd-tesserocr-segment-line=ocrd_tesserocr.cli:ocrd_tesserocr_segment_line',\n            'ocrd-tesserocr-segment-word=ocrd_tesserocr.cli:ocrd_tesserocr_segment_word',\n            'ocrd-tesserocr-crop=ocrd_tesserocr.cli:ocrd_tesserocr_crop',\n            'ocrd-tesserocr-deskew=ocrd_tesserocr.cli:ocrd_tesserocr_deskew',\n            'ocrd-tesserocr-binarize=ocrd_tesserocr.cli:ocrd_tesserocr_binarize',\n        ]\n    },\n)\n"
+            "setup.py": "# -*- coding: utf-8 -*-\n\"\"\"\nInstalls five executables:\n\n    - ocrd_tesserocr_recognize\n    - ocrd_tesserocr_segment_region\n    - ocrd_tesserocr_segment_line\n    - ocrd_tesserocr_segment_word\n    - ocrd_tesserocr_crop\n    - ocrd_tesserocr_deskew\n    - ocrd_tesserocr_binarize\n\"\"\"\nimport codecs\n\nfrom setuptools import setup, find_packages\n\nsetup(\n    name='ocrd_tesserocr',\n    version='0.4.1',\n    description='Tesserocr bindings',\n    long_description=codecs.open('README.rst', encoding='utf-8').read(),\n    author='Konstantin Baierer, Kay-Michael W\u00fcrzner, Robert Sachunsky',\n    author_email='unixprog@gmail.com, wuerzner@gmail.com, sachunsky@informatik.uni-leipzig.de',\n    url='https://github.com/OCR-D/ocrd_tesserocr',\n    license='Apache License 2.0',\n    packages=find_packages(exclude=('tests', 'docs')),\n    install_requires=open('requirements.txt').read().split('\\n'),\n    package_data={\n        '': ['*.json', '*.yml', '*.yaml'],\n    },\n    entry_points={\n        'console_scripts': [\n            'ocrd-tesserocr-recognize=ocrd_tesserocr.cli:ocrd_tesserocr_recognize',\n            'ocrd-tesserocr-segment-region=ocrd_tesserocr.cli:ocrd_tesserocr_segment_region',\n            'ocrd-tesserocr-segment-line=ocrd_tesserocr.cli:ocrd_tesserocr_segment_line',\n            'ocrd-tesserocr-segment-word=ocrd_tesserocr.cli:ocrd_tesserocr_segment_word',\n            'ocrd-tesserocr-crop=ocrd_tesserocr.cli:ocrd_tesserocr_crop',\n            'ocrd-tesserocr-deskew=ocrd_tesserocr.cli:ocrd_tesserocr_deskew',\n            'ocrd-tesserocr-binarize=ocrd_tesserocr.cli:ocrd_tesserocr_binarize',\n        ]\n    },\n)\n"
         },
         "git": {
-            "last_commit": "Mon Aug 26 05:16:59 2019 +0200",
-            "number_of_commits": "243"
+            "last_commit": "Thu Sep 26 15:06:11 2019 +0200",
+            "number_of_commits": "252"
         },
         "name": "ocrd_tesserocr",
         "ocrd_tool": {
@@ -808,186 +923,745 @@
     {
         "files": {
             "Dockerfile": null,
-            "README.md": "# Document Preprocessing and Segmentation\n\n[![CircleCI](https://circleci.com/gh/mjenckel/OCR-D-LAYoutERkennung.svg?style=svg)](https://circleci.com/gh/mjenckel/OCR-D-LAYoutERkennung)\n\n> Tools for preprocessing scanned images for OCR\n\n# Installing\n\nTo install anyBaseOCR dependencies system-wide:\n\n    $ sudo pip install .\n\nAlternatively, dependencies can be installed into a Virtual Environment:\n\n    $ virtualenv venv\n    $ source venv/bin/activate\n    $ pip install -e .\n\n## Tools included\n\nTo see how to run binarization, deskew, crop and dewarp, text/non-text segmentation and textline segmentation methods, please follow corresponding below files for a detailed description :\n\n   * [README_binarize.md](https://github.com/mjenckel/OCR-D-LAYoutERkennung/tree/master/docs/README_binarize.md) instruction for binarization method\n   * [README_deskew.md](https://github.com/mjenckel/OCR-D-LAYoutERkennung/tree/master/docs/README_deskew.md) instruction for deskew method\n   * [README_cropping.md](https://github.com/mjenckel/OCR-D-LAYoutERkennung/tree/master/docs/README_cropping.md) instruction for cropping method\n   * [README_dewarp.md](https://github.com/mjenckel/OCR-D-LAYoutERkennung/tree/master/docs/README_dewarp.md) instruction for dewarp method\n   * [README_tiseg.md](https://github.com/mjenckel/OCR-D-LAYoutERkennung/tree/master/docs/README_tigseg.md) instruction for text/non-text segmentation method\n   * [README_textline.md](https://github.com/mjenckel/OCR-D-LAYoutERkennung/tree/master/docs/README_textline.md) instruction for textline segmentation method\n\n## Binarizer\n\n### Method Behaviour \n This function takes a scanned colored /gray scale document image as input and do the black and white binarize image.\n \n #### Usage:\n```sh\nocrd-anybaseocr-binarize -m (path to METs input file) -I (Input group name) -O (Output group name) [-p (path to parameter file) -o (METs output filename)]\n```\n\n#### Example: \n```sh\nocrd-anybaseocr-binarize \\\n   -m mets.xml \\\n   -I OCR-D-IMG \\\n   -O OCR-D-IMG-BIN\n```\n\n## Deskewer\n\n### Method Behaviour \n This function takes a document image as input and do the skew correction of that document.\n \n #### Usage:\n```sh\nocrd-anybaseocr-deskew -m (path to METs input file) -I (Input group name) -O (Output group name) [-p (path to parameter file) -o (METs output filename)]\n```\n\n#### Example: \n```sh\nocrd-anybaseocr-deskew \\\n  -m mets.xml \\\n  -I OCR-D-IMG-BIN \\\n  -O OCR-D-IMG-DESKEW\n```\n\n## Cropper\n\n### Method Behaviour \n This function takes a document image as input and crops/selects the page content area only (that's mean remove textual noise as well as any other noise around page content area)\n \n #### Usage:\n```sh\nocrd-anybaseocr-cropping -m (path to METs input file) -I (Input group name) -O (Output group name) [-p (path to parameter file) -o (METs output filename)]\n```\n\n#### Example: \n```sh\nocrd-anybaseocr-cropping \\\n   -m mets.xml \\\n   -I OCR-D-IMG-DESKEW \\\n   -O OCR-D-IMG-CROP\n```\n\n\n## Dewarper\n\n### Method Behaviour \n This function takes a document image as input and make the text line straight if its curved.\n \n #### Usage:\n```sh\nocrd-anybaseocr-dewarp -m (path to METs input file) -I (Input group name) -O (Output group name) [-p (path to parameter file) -o (METs output filename)]\n```\n\n\n#### Example: \n```sh\nCUDA_VISIBLE_DEVICES=0 ocrd-anybaseocr-dewarp \\\n   -m mets.xml \\\n   -I OCR-D-IMG-CROP \\\n   -O OCR-D-IMG-DEWARP\n   -p params.json \n```\n\n## Text/Non-Text Segmenter\n\n### Method Behaviour \n This function takes a document image as an input and separates the text and non-text part from the input document image.\n \n #### Usage:\n```sh\nocrd-anybaseocr-tiseg -m (path to METs input file) -I (Input group name) -O (Output group name) [-p (path to parameter file) -o (METs output filename)]\n```\n\n#### Example: \n```sh\nocrd-anybaseocr-tiseg \\\n\t-m mets.xml \\\n\t-I OCR-D-IMG-CROP \\\n\t-O OCR-D-IMG-TISEG\n```\n\n## Textline Segmenter\n\n### Method Behaviour \n This function takes a cropped document image as an input and segment the image into textline images.\n \n #### Usage:\n```sh\nocrd-anybaseocr-textline -m (path to METs input file) -I (Input group name) -O (Output group name) [-p (path to parameter file) -o (METs output filename)]\n```\n\n#### Example: \n```sh\nocrd-anybaseocr-textline \\\n\t-m mets.xml \\\n\t-I OCR-D-IMG-TISEG \\\n\t-O OCR-D-IMG-TL\n```\n\n\n## Testing\n\nTo test the tools, download [OCR-D/assets](https://github.com/OCR-D/assets). In\nparticular, the code is tested with the\n[dfki-testdata](https://github.com/OCR-D/assets/tree/master/data/dfki-testdata)\ndataset.\n\nRun `make test` to run all tests.\n\n## License\n\n\n```\n Licensed under the Apache License, Version 2.0 (the \"License\");\n you may not use this file except in compliance with the License.\n You may obtain a copy of the License at\n\n     http://www.apache.org/licenses/LICENSE-2.0\n\n Unless required by applicable law or agreed to in writing, software\n distributed under the License is distributed on an \"AS IS\" BASIS,\n WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n See the License for the specific language governing permissions and\n limitations under the License.\n ```\n",
-            "ocrd-tool.json": "{\n  \"git_url\": \"https://github.com/mjenckel/LAYoutERkennung/\",\n  \"version\": \"0.0.1\",\n  \"tools\": {\n    \"ocrd-anybaseocr-binarize\": {\n      \"executable\": \"ocrd-anybaseocr-binarize\",\n      \"description\": \"Binarize images with the algorithm from ocropy\",\n      \"categories\": [\"Image preprocessing\"],\n      \"steps\": [\"preprocessing/optimization/binarization\"],\n      \"input_file_grp\": [\"OCR-D-IMG\"],\n      \"output_file_grp\": [\"OCR-D-IMG-BIN\"],\n      \"parameters\": {\n        \"nocheck\":         {\"type\": \"boolean\",                     \"default\": false, \"description\": \"disable error checking on inputs\"},\n        \"show\":            {\"type\": \"boolean\",                     \"default\": false, \"description\": \"display final results\"},\n        \"raw_copy\":        {\"type\": \"boolean\",                     \"default\": false, \"description\": \"also copy the raw image\"},\n        \"gray\":            {\"type\": \"boolean\",                     \"default\": false, \"description\": \"force grayscale processing even if image seems binary\"},\n        \"bignore\":         {\"type\": \"number\", \"format\": \"float\",   \"default\": 0.1,   \"description\": \"ignore this much of the border for threshold estimation\"},\n        \"debug\":           {\"type\": \"number\", \"format\": \"integer\", \"default\": 0,     \"description\": \"display intermediate results\"},\n        \"escale\":          {\"type\": \"number\", \"format\": \"float\",   \"default\": 1.0,   \"description\": \"scale for estimating a mask over the text region\"},\n        \"hi\":              {\"type\": \"number\", \"format\": \"float\",   \"default\": 90,    \"description\": \"percentile for white estimation\"},\n        \"lo\":              {\"type\": \"number\", \"format\": \"float\",   \"default\": 5,     \"description\": \"percentile for black estimation\"},\n        \"perc\":            {\"type\": \"number\", \"format\": \"float\",   \"default\": 80,    \"description\": \"percentage for filters\"},\n        \"range\":           {\"type\": \"number\", \"format\": \"integer\", \"default\": 20,    \"description\": \"range for filters\"},\n        \"threshold\":       {\"type\": \"number\", \"format\": \"float\",   \"default\": 0.5,   \"description\": \"threshold, determines lightness\"},\n        \"zoom\":            {\"type\": \"number\", \"format\": \"float\",   \"default\": 0.5,   \"description\": \"zoom for page background estimation, smaller=faster\"},\n        \"operation_level\": {\"type\": \"string\", \"enum\": [\"page\",\"region\", \"line\"], \"default\": \"page\",\"description\": \"PAGE XML hierarchy level to operate on\"}\n      }\n    },\n    \"ocrd-anybaseocr-deskew\": {\n      \"executable\": \"ocrd-anybaseocr-deskew\",\n      \"description\": \"Deskew images with the algorithm from ocropy\",\n      \"categories\": [\"Image preprocessing\"],\n      \"steps\": [\"preprocessing/optimization/deskewing\"],\n      \"input_file_grp\": [\"OCR-D-IMG-BIN\"],\n      \"output_file_grp\": [\"OCR-D-IMG-DESKEW\"],\n      \"parameters\": {\n        \"escale\":    {\"type\": \"number\", \"format\": \"float\",   \"default\": 1.0, \"description\": \"scale for estimating a mask over the text region\"},\n        \"bignore\":   {\"type\": \"number\", \"format\": \"float\",   \"default\": 0.1, \"description\": \"ignore this much of the border for threshold estimation\"},\n        \"threshold\": {\"type\": \"number\", \"format\": \"float\",   \"default\": 0.5, \"description\": \"threshold, determines lightness\"},\n        \"maxskew\":   {\"type\": \"number\", \"format\": \"float\",   \"default\": 1.0, \"description\": \"skew angle estimation parameters (degrees)\"},\n        \"skewsteps\": {\"type\": \"number\", \"format\": \"integer\", \"default\": 8,   \"description\": \"steps for skew angle estimation (per degree)\"},\n        \"debug\":     {\"type\": \"number\", \"format\": \"integer\", \"default\": 0,   \"description\": \"display intermediate results\"},\n        \"parallel\":  {\"type\": \"number\", \"format\": \"integer\", \"default\": 0,   \"description\": \"???\"},\n        \"lo\":        {\"type\": \"number\", \"format\": \"integer\", \"default\": 5,   \"description\": \"percentile for black estimation\"},\n        \"hi\":        {\"type\": \"number\", \"format\": \"integer\", \"default\": 90,   \"description\": \"percentile for white estimation\"},\n        \"operation_level\": {\"type\": \"string\", \"enum\": [\"page\",\"region\", \"line\"], \"default\": \"page\",\"description\": \"PAGE XML hierarchy level to operate on\"}\n      }\n    },\n    \"ocrd-anybaseocr-crop\": {\n      \"executable\": \"ocrd-anybaseocr-crop\",\n      \"description\": \"Image crop using non-linear processing\",\n      \"categories\": [\"Image preprocessing\"],\n      \"steps\": [\"preprocessing/optimization/cropping\"],\n      \"input_file_grp\": [\"OCR-D-IMG-DESKEW\"],\n      \"output_file_grp\": [\"OCR-D-IMG-CROP\"],\n      \"parameters\": {\n        \"colSeparator\":  {\"type\": \"number\", \"format\": \"float\", \"default\": 0.04, \"description\": \"consider space between column. 25% of width\"},\n        \"maxRularArea\":  {\"type\": \"number\", \"format\": \"float\", \"default\": 0.3, \"description\": \"Consider maximum rular area\"},\n        \"minArea\":       {\"type\": \"number\", \"format\": \"float\", \"default\": 0.05, \"description\": \"rular position in below\"},\n        \"minRularArea\":  {\"type\": \"number\", \"format\": \"float\", \"default\": 0.01, \"description\": \"Consider minimum rular area\"},\n        \"positionBelow\": {\"type\": \"number\", \"format\": \"float\", \"default\": 0.75, \"description\": \"rular position in below\"},\n        \"positionLeft\":  {\"type\": \"number\", \"format\": \"float\", \"default\": 0.4, \"description\": \"rular position in left\"},\n        \"positionRight\": {\"type\": \"number\", \"format\": \"float\", \"default\": 0.6, \"description\": \"rular position in right\"},\n        \"rularRatioMax\": {\"type\": \"number\", \"format\": \"float\", \"default\": 10.0, \"description\": \"rular position in below\"},\n        \"rularRatioMin\": {\"type\": \"number\", \"format\": \"float\", \"default\": 3.0, \"description\": \"rular position in below\"},\n        \"rularWidth\":    {\"type\": \"number\", \"format\": \"float\", \"default\": 0.95, \"description\": \"maximum rular width\"},\n        \"operation_level\": {\"type\": \"string\", \"enum\": [\"page\",\"region\", \"line\"], \"default\": \"page\",\"description\": \"PAGE XML hierarchy level to operate on\"}\n      }\n    },\n    \"ocrd-anybaseocr-dewarp\": {\n      \"executable\": \"ocrd-anybaseocr-dewarp\",\n      \"description\": \"dewarp image with anyBaseOCR\",\n      \"categories\": [\"Image preprocessing\"],\n      \"steps\": [\"preprocessing/optimization/dewarping\"],\n      \"input_file_grp\": [\"OCR-D-IMG-CROP\"],\n      \"output_file_grp\": [\"OCR-D-IMG-DEWARP\"],\n      \"parameters\": {\n        \"imgresize\":    { \"type\": \"string\",                      \"default\": \"resize_and_crop\", \"description\": \"run on original size image\"},\n        \"pix2pixHD\":    { \"type\": \"string\",                      \"required\": true, \"description\": \"Path to pix2pixHD library\"},\n        \"gpu_id\":       { \"type\": \"number\", \"format\": \"integer\", \"default\": 0,    \"description\": \"gpu id\"},\n        \"resizeHeight\": { \"type\": \"number\", \"format\": \"integer\", \"default\": 1024, \"description\": \"resized image height\"},\n        \"resizeWidth\":  { \"type\": \"number\", \"format\": \"integer\", \"default\": 1024, \"description\": \"resized image width\"}\n      }\n    },\n    \"ocrd-anybaseocr-tiseg\": {\n      \"executable\": \"ocrd-anybaseocr-tiseg\",\n      \"input_file_grp\": [\"OCR-D-IMG-CROP\"],\n      \"output_file_grp\": [\"OCR-D-SEG-TISEG\"],\n      \"categories\": [\"Layout analysis\"],\n      \"steps\": [\"layout/segmentation/text-image\"],\n      \"description\": \"separate text and non-text part with anyBaseOCR\",\n      \"parameters\": {\n      }\n    },\n    \"ocrd-anybaseocr-textline\": {\n      \"executable\": \"ocrd-anybaseocr-textline\",\n      \"input_file_grp\": [\"OCR-D-SEG-TISEG\"],\n      \"output_file_grp\": [\"OCR-D-SEG-LINE-ANY\"],\n      \"categories\": [\"Layout analysis\"],\n      \"steps\": [\"layout/segmentation/line\"],\n      \"description\": \"separate each text line\",\n      \"parameters\": {\n        \"minscale\":    {\"type\": \"number\", \"format\": \"float\", \"default\": 12.0, \"description\": \"minimum scale permitted\"},\n        \"maxlines\":    {\"type\": \"number\", \"format\": \"float\", \"default\": 300, \"description\": \"non-standard scaling of horizontal parameters\"},\n        \"scale\":       {\"type\": \"number\", \"format\": \"float\", \"default\": 0.0, \"description\": \"the basic scale of the document (roughly, xheight) 0=automatic\"},\n        \"hscale\":      {\"type\": \"number\", \"format\": \"float\", \"default\": 1.0, \"description\": \"non-standard scaling of horizontal parameters\"},\n        \"vscale\":      {\"type\": \"number\", \"format\": \"float\", \"default\": 1.7, \"description\": \"non-standard scaling of vertical parameters\"},\n        \"threshold\":   {\"type\": \"number\", \"format\": \"float\", \"default\": 0.2, \"description\": \"baseline threshold\"},\n        \"noise\":       {\"type\": \"number\", \"format\": \"integer\", \"default\": 8, \"description\": \"noise threshold for removing small components from lines\"},\n        \"usegauss\":    {\"type\": \"boolean\", \"default\": false, \"description\": \"use gaussian instead of uniform\"},\n        \"maxseps\":     {\"type\": \"number\", \"format\": \"integer\", \"default\": 2, \"description\": \"maximum black column separators\"},\n        \"sepwiden\":    {\"type\": \"number\", \"format\": \"integer\", \"default\": 10, \"description\": \"widen black separators (to account for warping)\"},\n        \"blackseps\":   {\"type\": \"boolean\", \"default\": false, \"description\": \"also check for black column separators\"},\n        \"maxcolseps\":  {\"type\": \"number\", \"format\": \"integer\", \"default\": 2, \"description\": \"maximum # whitespace column separators\"},\n        \"csminaspect\": {\"type\": \"number\", \"format\": \"float\", \"default\": 1.1, \"description\": \"minimum aspect ratio for column separators\"},\n        \"csminheight\": {\"type\": \"number\", \"format\": \"float\", \"default\": 6.5, \"description\": \"minimum column height (units=scale)\"},\n        \"pad\":         {\"type\": \"number\", \"format\": \"integer\", \"default\": 3, \"description\": \"padding for extracted lines\"},\n        \"expand\":      {\"type\": \"number\", \"format\": \"integer\", \"default\": 3, \"description\": \"expand mask for grayscale extraction\"},\n        \"parallel\":    {\"type\": \"number\", \"format\": \"integer\", \"default\": 0, \"description\": \"number of CPUs to use\"},\n        \"libpath\":     {\"type\": \"string\", \"default\": \".\", \"description\": \"Library Path for C Executables\"}\n      }\n    },\n    \"ocrd-anybaseocr-layout-analysis\": {\n      \"executable\": \"ocrd-anybaseocr-layout-analysis\",\n      \"input_file_grp\": [\"OCR-D-IMG-CROP\"],\n      \"output_file_grp\": [\"OCR-D-SEG-LAYOUT\"],\n      \"categories\": [\"Layout analysis\"],\n      \"steps\": [\"layout/segmentation/text-image\"],\n      \"description\": \"Analysis of the input document\",\n      \"parameters\": {\n        \"batch_size\":         {\"type\": \"number\", \"format\": \"integer\", \"default\": 4, \"description\": \"Batch size for generating test images\"},\n        \"model_path\":         { \"type\": \"string\",                     \"required\": true, \"description\": \"Path to Layout Structure Classification Model\"},\n        \"class_mapping_path\": { \"type\": \"string\",                     \"required\": true, \"description\": \"Path to Layout Structure Classes\"}\n      }\n    },\n    \"ocrd-anybaseocr-block-segmentation\": {\n      \"executable\": \"ocrd-anybaseocr-block-segmentation\",\n      \"input_file_grp\": [\"OCR-D-IMG\"],\n      \"output_file_grp\": [\"OCR-D-BLOCK-SEGMENT\"],\n      \"categories\": [\"Layout analysis\"],\n      \"steps\": [\"layout/segmentation/text-image\"],\n      \"description\": \"Analysis of the input document\",\n      \"parameters\": {        \n        \"block_segmentation_model\":   { \"type\": \"string\",                     \"required\": true, \"description\": \"Path to Layout Structure Classification Model\"},\n        \"block_segmentation_weights\": { \"type\": \"string\",                     \"required\": true, \"description\": \"Path to Layout Structure Classes\"}\n      }\n    }       \n  }\n}\n",
-            "setup.py": "# -*- coding: utf-8 -*-\nfrom setuptools import setup, find_packages\n\nsetup(\n    name='ocrd-anybaseocr',\n    version='v0.0.1',\n    author=\"DFKI\",\n    author_email=\"Saqib.Bukhari@dfki.de, Mohammad_mohsin.reza@dfki.de\",\n    url=\"https://github.com/mjenckel/LAYoutERkennung\",\n    license='Apache License 2.0',\n    long_description=open('README.md').read(),\n    long_description_content_type='text/markdown',\n    install_requires=open('requirements.txt').read().split('\\n'),\n    packages=find_packages(exclude=[\"work_dir\", \"src\"]),\n    package_data={\n        '': ['*.json']\n    },\n    entry_points={\n        'console_scripts': [\n            'ocrd-anybaseocr-binarize           = ocrd_anybaseocr.cli.cli:ocrd_anybaseocr_binarize',\n            'ocrd-anybaseocr-deskew             = ocrd_anybaseocr.cli.cli:ocrd_anybaseocr_deskew',\n            'ocrd-anybaseocr-crop               = ocrd_anybaseocr.cli.cli:ocrd_anybaseocr_cropping',        \n            'ocrd-anybaseocr-dewarp             = ocrd_anybaseocr.cli.cli:ocrd_anybaseocr_dewarp',\n            'ocrd-anybaseocr-tiseg              = ocrd_anybaseocr.cli.cli:ocrd_anybaseocr_tiseg',\n            'ocrd-anybaseocr-textline           = ocrd_anybaseocr.cli.cli:ocrd_anybaseocr_textline',\n            'ocrd-anybaseocr-layout-analysis    = ocrd_anybaseocr.cli.cli:ocrd_anybaseocr_layout_analysis',\n            'ocrd-anybaseocr-block-segmentation = ocrd_anybaseocr.cli.cli:ocrd_anybaseocr_block_segmentation'\n        ]\n    },\n)\n"
+            "README.md": "![build status](https://travis-ci.org/cisocrgroup/cis-ocrd-py.svg?branch=dev)\n# cis-ocrd-py\n\n[CIS](http://www.cis.lmu.de) [OCR-D](http://ocr-d.de) command line tools\n\n## General usage\n\n### Essential system packages\n\n```sh\nsudo apt-get install \\\n  git \\\n  build-essential \\\n  python3 python3-pip \\\n  libxml2-dev \\\n  default-jdk\n```\n\n\n\n### Virtualenv\n\nUse `virtualenv` to install dependencies:\n* `virtualenv -p python3.6 env`\n* `source env/bin/activate`\n* `pip install -e path/to/dir/containing/setup.py`\n\nUse `deactivate` to deactivate the virtualenv again.\n\n### OCR-D workspace\n\n* Create a new (empty) workspace: `ocrd workspace init workspace-dir`\n* cd into `workspace-dir`\n* Add new file to workspace: `ocrd workspace add file -G group -i id\n  -m mimetype`\n\n### Tests\n\nIssue `make test` to run the automated test suite. The tests depend on\nthe following tools:\n\n* [wget](https://www.gnu.org/software/wget/)\n* [envsubst](https://linux.die.net/man/1/envsubst)\n\nYou can run individual testcases using the `run_*_test.bash` scripts in\nthe tests directory. Use the `--persistent` or `-p` flag to keep\ntemporary directories.\n\nYou can override the temporary directory by setting the `TMP_DIR` environment\nvariable.\n\n## Tools\n\n### ocrd-cis-align\n\nThe alignment tool line-aligns multiple file groups. It can be used to\nalign the results of multiple OCRs with their respective ground-truth.\n\nThe tool expects a comma-separated list of input file groups, the\naccording output file group and the url of the configuration file:\n\n```sh\nocrd-cis-align \\\n  --input-file-grp 'ocr1,ocr2,gt' \\\n  --output-file-grp 'ocr1+ocr2+gt' \\\n  --mets mets.xml \\\n  --parameter file:///path/to/config.json\n```\n\n\n### ocrd-cis-ocropy-train\nThe ocropy-train tool can be used to train LSTM models.\nIt takes ground truth from the workspace and saves (image+text) snippets from the corresponding pages.\nThen a model is trained on all snippets for 1 million (or the given number of) randomized iterations from the parameter file.\n```sh\nocrd-cis-ocropy-train \\\n  --input-file-grp OCR-D-GT-SEG-LINE \\\n  --mets mets.xml\n  --parameter file:///path/to/config.json\n```\n\n### ocrd-cis-ocropy-clip\nThe ocropy-clip tool can be used to remove intrusions of neighbouring segments in regions / lines of a workspace.\nIt runs a (ad-hoc binarization and) connected component analysis on every text region / line of every PAGE in the input file group, as well as its overlapping neighbours, and for each binary object of conflict, determines whether it belongs to the neighbour, and can therefore be clipped to white. It references the resulting segment image files in the output PAGE (as AlternativeImage).\n```sh\nocrd-cis-ocropy-clip \\\n  --input-file-grp OCR-D-SEG-LINE \\\n  --output-file-grp OCR-D-SEG-LINE-CLIP \\\n  --mets mets.xml\n  --parameter file:///path/to/config.json\n```\n\n### ocrd-cis-ocropy-resegment\nThe ocropy-resegment tool can be used to remove overlap between lines of a workspace.\nIt runs a (ad-hoc binarization and) line segmentation on every text region of every PAGE in the input file group, and for each line already annotated, determines the label of largest extent within the original coordinates (polygon outline) in that line, and annotates the resulting coordinates in the output PAGE.\n```sh\nocrd-cis-ocropy-resegment \\\n  --input-file-grp OCR-D-SEG-LINE \\\n  --output-file-grp OCR-D-SEG-LINE-RES \\\n  --mets mets.xml\n  --parameter file:///path/to/config.json\n```\n\n### ocrd-cis-ocropy-segment\nThe ocropy-segment tool can be used to segment regions into lines.\nIt runs a (ad-hoc binarization and) line segmentation on every text region of every PAGE in the input file group, and adds a TextLine element with the resulting polygon outline to the annotation of the output PAGE.\n```sh\nocrd-cis-ocropy-segment \\\n  --input-file-grp OCR-D-SEG-BLOCK \\\n  --output-file-grp OCR-D-SEG-LINE \\\n  --mets mets.xml\n  --parameter file:///path/to/config.json\n```\n\n### ocrd-cis-ocropy-deskew\nThe ocropy-deskew tool can be used to deskew pages / regions of a workspace.\nIt runs the Ocropy thresholding and deskewing estimation on every segment of every PAGE in the input file group and annotates the orientation angle in the output PAGE.\n```sh\nocrd-cis-ocropy-deskew \\\n  --input-file-grp OCR-D-SEG-LINE \\\n  --output-file-grp OCR-D-SEG-LINE-DES \\\n  --mets mets.xml\n  --parameter file:///path/to/config.json\n```\n\n### ocrd-cis-ocropy-denoise\nThe ocropy-denoise tool can be used to despeckle pages / regions / lines of a workspace.\nIt runs the Ocropy \"nlbin\" denoising on every segment of every PAGE in the input file group and references the resulting segment image files in the output PAGE (as AlternativeImage). \n```sh\nocrd-cis-ocropy-denoise \\\n  --input-file-grp OCR-D-SEG-LINE-DES \\\n  --output-file-grp OCR-D-SEG-LINE-DEN \\\n  --mets mets.xml\n  --parameter file:///path/to/config.json\n```\n\n### ocrd-cis-ocropy-binarize\nThe ocropy-binarize tool can be used to binarize, denoise and deskew pages / regions / lines of a workspace.\nIt runs the Ocropy \"nlbin\" adaptive thresholding, deskewing estimation and denoising on every segment of every PAGE in the input file group and references the resulting segment image files in the output PAGE (as AlternativeImage). (If a deskewing angle has already been annotated in a region, the tool respects that and rotates accordingly.) Images can also be produced grayscale-normalized.\n```sh\nocrd-cis-ocropy-binarize \\\n  --input-file-grp OCR-D-SEG-LINE-DES \\\n  --output-file-grp OCR-D-SEG-LINE-BIN \\\n  --mets mets.xml\n  --parameter file:///path/to/config.json\n```\n\n### ocrd-cis-ocropy-dewarp\nThe ocropy-dewarp tool can be used to dewarp text lines of a workspace.\nIt runs the Ocropy baseline estimation and dewarping on every line in every text region of every PAGE in the input file group and references the resulting line image files in the output PAGE (as AlternativeImage).\n```sh\nocrd-cis-ocropy-dewarp \\\n  --input-file-grp OCR-D-SEG-LINE-BIN \\\n  --output-file-grp OCR-D-SEG-LINE-DEW \\\n  --mets mets.xml\n  --parameter file:///path/to/config.json\n```\n\n### ocrd-cis-ocropy-recognize\nThe ocropy-recognize tool can be used to recognize lines / words / glyphs from pages of a workspace.\nIt runs the Ocropy optical character recognition on every line in every text region of every PAGE in the input file group and adds the resulting text annotation in the output PAGE.\n```sh\nocrd-cis-ocropy-recognize \\\n  --input-file-grp OCR-D-SEG-LINE-DEW \\\n  --output-file-grp OCR-D-OCR-OCRO \\\n  --mets mets.xml\n  --parameter file:///path/to/config.json\n```\n\n## All in One Tool\nFor the all in One Tool install all above tools and Tesserocr as explained below.\nThen use it like:\n```sh\nocrd-cis-aio --parameter file:///path/to/config.json\n```\n\n\n### Tesserocr\nInstall essential system packages for Tesserocr\n```sh\nsudo apt-get install python3-tk \\\n  tesseract-ocr libtesseract-dev libleptonica-dev \\\n  libimage-exiftool-perl libxml2-utils\n```\n\nThen install Tesserocr from: https://github.com/OCR-D/ocrd_tesserocr\n```sh\npip install -r requirements.txt\npip install .\n```\n\nDownload and move tesseract models from:\nhttps://github.com/tesseract-ocr/tesseract/wiki/Data-Files\nor use your own models\nplace them into: /usr/share/tesseract-ocr/4.00/tessdata\n\nTesserocr v2.4.0 seems broken for tesseract 4.0.0-beta. Install\nVersion v2.3.1 instead: `pip install tesseract==2.3.1`.\n\n## Workflow configuration\n\nA decent pipeline might look like this:\n\n1. page-level cropping\n2. page-level binarization\n3. page-level deskewing\n4. page-level dewarping\n5. region segmentation\n6. region-level clipping\n7. region-level deskewing\n8. line segmentation\n9. line-level clipping or resegmentation\n10. line-level dewarping\n11. line-level recognition\n12. line-level alignment\n\nIf GT is used, steps 1, 5 and 8 can be omitted. Else if a segmentation is used in 5 and 8 which does not produce overlapping sections, steps 6 and 9 can be omitted.\n\n## OCR-D links\n\n- [OCR-D](https://ocr-d.github.io)\n- [Github](https://github.com/OCR-D)\n- [Project-page](http://www.ocr-d.de/)\n- [Ground-truth](http://www.ocr-d.de/sites/all/GTDaten/IndexGT.html)\n",
+            "ocrd-tool.json": "{\n\t\"git_url\": \"https://github.com/cisocrgroup/cis-ocrd-py\",\n\t\"version\": \"0.0.1\",\n\t\"tools\": {\n\t\t\"ocrd-cis-aio\": {\n\t\t\t\"executable\": \"ocrd-cis-aio\",\n\t\t\t\"categories\": [\n\t\t\t\t\"Text recognition and optimization\"\n\t\t\t],\n\t\t\t\"steps\": [\n\t\t\t\t\"postprocessing/alignment/recognition\"\n\t\t\t],\n\t\t\t\"description\": \"All in One Tool\",\n\t\t\t\"parameters\": {\n\t\t\t\t\"tesserparampath\": {\n\t\t\t\t\t\"type\": \"string\",\n\t\t\t\t\t\"required\": true\n\t\t\t\t},\n\t\t\t\t\"ocropyparampath1\": {\n\t\t\t\t\t\"type\": \"string\",\n\t\t\t\t\t\"required\": true\n\t\t\t\t},\n\t\t\t\t\"ocropyparampath2\": {\n\t\t\t\t\t\"type\": \"string\",\n\t\t\t\t\t\"required\": true\n\t\t\t\t},\n\t\t\t\t\"alignparampath\": {\n\t\t\t\t\t\"type\": \"string\",\n\t\t\t\t\t\"required\": true\n\t\t\t\t}\n\t\t\t}\n\t\t},\n\t\t\"ocrd-cis-align\": {\n\t\t\t\"executable\": \"ocrd-cis-align\",\n\t\t\t\"categories\": [\n\t\t\t\t\"Text recognition and optimization\"\n\t\t\t],\n\t\t\t\"steps\": [\n\t\t\t\t\"postprocessing/alignment\"\n\t\t\t],\n\t\t\t\"description\": \"Align multiple OCRs and/or GTs\"\n\t\t},\n\t\t\"ocrd-cis-ocropy-binarize\": {\n\t\t\t\"executable\": \"ocrd-cis-ocropy-binarize\",\n\t\t\t\"categories\": [\n\t\t\t\t\"Image preprocessing\"\n\t\t\t],\n\t\t\t\"steps\": [\n\t\t\t\t\"preprocessing/optimization/binarization\",\n\t\t\t\t\"preprocessing/optimization/grayscale_normalization\",\n\t\t\t\t\"preprocessing/optimization/deskewing\"\n\t\t\t],\n\t\t\t\"input_file_grp\": [\n\t\t\t\t\"OCR-D-IMG\",\n\t\t\t\t\"OCR-D-SEG-BLOCK\",\n\t\t\t\t\"OCR-D-SEG-LINE\"\n\t\t\t],\n\t\t\t\"output_file_grp\": [\n\t\t\t\t\"OCR-D-IMG-BIN\",\n\t\t\t\t\"OCR-D-SEG-BLOCK\",\n\t\t\t\t\"OCR-D-SEG-LINE\"\n\t\t\t],\n\t\t\t\"description\": \"Binarize (and optionally deskew/despeckle) pages / regions / lines with ocropy\",\n\t\t\t\"parameters\": {\n\t\t\t\t\"method\": {\n\t\t\t\t\t\"type\": \"string\",\n\t\t\t\t\t\"enum\": [\"none\", \"global\", \"otsu\", \"gauss-otsu\", \"ocropy\"],\n\t\t\t\t\t\"description\": \"binarization method to use (only ocropy will include deskewing)\",\n\t\t\t\t\t\"default\": \"ocropy\"\n\t\t\t\t},\n\t\t\t\t\"grayscale\": {\n\t\t\t\t\t\"type\": \"boolean\",\n\t\t\t\t\t\"description\": \"for the ocropy method, produce grayscale-normalized instead of thresholded image\",\n\t\t\t\t\t\"default\": false\n\t\t\t\t},\n\t\t\t\t\"maxskew\": {\n\t\t\t\t\t\"type\": \"number\",\n\t\t\t\t\t\"description\": \"modulus of maximum skewing angle to detect (larger will be slower, 0 will deactivate deskewing)\",\n\t\t\t\t\t\"default\": 0.0\n\t\t\t\t},\n\t\t\t\t\"noise_maxsize\": {\n\t\t\t\t\t\"type\": \"number\",\n\t\t\t\t\t\"description\": \"maximum pixel number for connected components to regard as noise (0 will deactivate denoising)\",\n\t\t\t\t\t\"default\": 0\n\t\t\t\t},\n\t\t\t\t\"level-of-operation\": {\n\t\t\t\t\t\"type\": \"string\",\n\t\t\t\t\t\"enum\": [\"page\", \"region\", \"line\"],\n\t\t\t\t\t\"description\": \"PAGE XML hierarchy level granularity to annotate images for\",\n\t\t\t\t\t\"default\": \"page\"\n\t\t\t\t}\n\t\t\t}\n\t\t},\n\t\t\"ocrd-cis-ocropy-deskew\": {\n\t\t\t\"executable\": \"ocrd-cis-ocropy-deskew\",\n\t\t\t\"categories\": [\n\t\t\t\t\"Image preprocessing\"\n\t\t\t],\n\t\t\t\"steps\": [\n\t\t\t\t\"preprocessing/optimization/deskewing\"\n\t\t\t],\n\t\t\t\"input_file_grp\": [\n\t\t\t\t\"OCR-D-SEG-BLOCK\",\n\t\t\t\t\"OCR-D-SEG-LINE\"\n\t\t\t],\n\t\t\t\"output_file_grp\": [\n\t\t\t\t\"OCR-D-SEG-BLOCK\",\n\t\t\t\t\"OCR-D-SEG-LINE\"\n\t\t\t],\n\t\t\t\"description\": \"Deskew regions with ocropy (by annotating orientation angle and adding AlternativeImage)\",\n\t\t\t\"parameters\": {\n\t\t\t\t\"maxskew\": {\n\t\t\t\t\t\"type\": \"number\",\n\t\t\t\t\t\"description\": \"modulus of maximum skewing angle to detect (larger will be slower, 0 will deactivate deskewing)\",\n\t\t\t\t\t\"default\": 5.0\n\t\t\t\t},\n\t\t\t\t\"level-of-operation\": {\n\t\t\t\t\t\"type\": \"string\",\n\t\t\t\t\t\"enum\": [\"page\", \"region\"],\n\t\t\t\t\t\"description\": \"PAGE XML hierarchy level granularity to annotate images for\",\n\t\t\t\t\t\"default\": \"region\"\n\t\t\t\t}\n\t\t\t}\n\t\t},\n\t\t\"ocrd-cis-ocropy-denoise\": {\n\t\t\t\"executable\": \"ocrd-cis-ocropy-denoise\",\n\t\t\t\"categories\": [\n\t\t\t\t\"Image preprocessing\"\n\t\t\t],\n\t\t\t\"steps\": [\n\t\t\t\t\"preprocessing/optimization/despeckling\"\n\t\t\t],\n\t\t\t\"input_file_grp\": [\n\t\t\t\t\"OCR-D-IMG\",\n\t\t\t\t\"OCR-D-SEG-BLOCK\",\n\t\t\t\t\"OCR-D-SEG-LINE\"\n\t\t\t],\n\t\t\t\"output_file_grp\": [\n\t\t\t\t\"OCR-D-IMG-DESPECK\",\n\t\t\t\t\"OCR-D-SEG-BLOCK\",\n\t\t\t\t\"OCR-D-SEG-LINE\"\n\t\t\t],\n\t\t\t\"description\": \"Despeckle pages / regions / lines with ocropy\",\n\t\t\t\"parameters\": {\n\t\t\t\t\"noise_maxsize\": {\n\t\t\t\t\t\"type\": \"number\",\n\t\t\t\t\t\"description\": \"maximum pixel number for connected components to regard as noise (0 will deactivate denoising)\",\n\t\t\t\t\t\"default\": 2\n\t\t\t\t},\n\t\t\t\t\"level-of-operation\": {\n\t\t\t\t\t\"type\": \"string\",\n\t\t\t\t\t\"enum\": [\"page\", \"region\", \"line\"],\n\t\t\t\t\t\"description\": \"PAGE XML hierarchy level granularity to annotate images for\",\n\t\t\t\t\t\"default\": \"page\"\n\t\t\t\t}\n\t\t\t}\n\t\t},\n\t\t\"ocrd-cis-ocropy-clip\": {\n\t\t\t\"executable\": \"ocrd-cis-ocropy-clip\",\n\t\t\t\"categories\": [\n\t\t\t\t\"Layout analysis\"\n\t\t\t],\n\t\t\t\"steps\": [\n\t\t\t\t\"layout/segmentation/region\",\n\t\t\t\t\"layout/segmentation/line\"\n\t\t\t],\n\t\t\t\"input_file_grp\": [\n\t\t\t\t\"OCR-D-SEG-BLOCK\",\n\t\t\t\t\"OCR-D-SEG-LINE\"\n\t\t\t],\n\t\t\t\"output_file_grp\": [\n\t\t\t\t\"OCR-D-SEG-BLOCK\",\n\t\t\t\t\"OCR-D-SEG-LINE\"\n\t\t\t],\n\t\t\t\"description\": \"Clip text regions / lines at intersections with neighbours\",\n\t\t\t\"parameters\": {\n\t\t\t\t\"level-of-operation\": {\n\t\t\t\t\t\"type\": \"string\",\n\t\t\t\t\t\"enum\": [\"region\", \"line\"],\n\t\t\t\t\t\"description\": \"PAGE XML hierarchy level granularity to annotate images for\",\n\t\t\t\t\t\"default\": \"region\"\n\t\t\t\t},\n\t\t\t\t\"min_fraction\": {\n\t\t\t\t\t\"type\": \"number\",\n\t\t\t\t\t\"format\": \"float\",\n\t\t\t\t\t\"description\": \"share of foreground pixels that must be retained by the largest label\",\n\t\t\t\t\t\"default\": 0.7\n\t\t\t\t}\n\t\t\t}\n\t\t},\n\t\t\"ocrd-cis-ocropy-resegment\": {\n\t\t\t\"executable\": \"ocrd-cis-ocropy-resegment\",\n\t\t\t\"categories\": [\n\t\t\t\t\"Layout analysis\"\n\t\t\t],\n\t\t\t\"steps\": [\n\t\t\t\t\"layout/segmentation/line\"\n\t\t\t],\n\t\t\t\"input_file_grp\": [\n\t\t\t\t\"OCR-D-SEG-LINE\"\n\t\t\t],\n\t\t\t\"output_file_grp\": [\n\t\t\t\t\"OCR-D-SEG-LINE\"\n\t\t\t],\n\t\t\t\"description\": \"Resegment lines with ocropy (by shrinking annotated polygons)\",\n\t\t\t\"parameters\": {\n\t\t\t\t\"min_fraction\": {\n\t\t\t\t\t\"type\": \"number\",\n\t\t\t\t\t\"format\": \"float\",\n\t\t\t\t\t\"description\": \"share of foreground pixels that must be retained by the largest label\",\n\t\t\t\t\t\"default\": 0.8\n\t\t\t\t},\n\t\t\t\t\"extend_margins\": {\n\t\t\t\t\t\"type\": \"number\",\n\t\t\t\t\t\"format\": \"integer\",\n\t\t\t\t\t\"description\": \"number of pixels to extend the input polygons horizontally and vertically before intersecting\",\n\t\t\t\t\t\"default\": 3\n\t\t\t\t}\n\t\t\t}\n\t\t},\n\t\t\"ocrd-cis-ocropy-dewarp\": {\n\t\t\t\"executable\": \"ocrd-cis-ocropy-dewarp\",\n\t\t\t\"categories\": [\n\t\t\t\t\"Image preprocessing\"\n\t\t\t],\n\t\t\t\"steps\": [\n\t\t\t\t\"preprocessing/optimization/dewarping\"\n\t\t\t],\n\t\t\t\"description\": \"Dewarp line images with ocropy\",\n\t\t\t\"input_file_grp\": [\n\t\t\t\t\"OCR-D-SEG-LINE\"\n\t\t\t],\n\t\t\t\"output_file_grp\": [\n\t\t\t\t\"OCR-D-SEG-LINE\"\n\t\t\t],\n\t\t\t\"parameters\": {\n\t\t\t\t\"range\": {\n\t\t\t\t\t\"type\": \"number\",\n\t\t\t\t\t\"description\": \"maximum vertical disposition or maximum margin (will be multiplied by mean centerline deltas to yield pixels)\",\n\t\t\t\t\t\"default\": 4\n\t\t\t\t}\n\t\t\t}\n\t\t},\n\t\t\"ocrd-cis-ocropy-recognize\": {\n\t\t\t\"executable\": \"ocrd-cis-ocropy-recognize\",\n\t\t\t\"categories\": [\n\t\t\t\t\"Text recognition and optimization\"\n\t\t\t],\n\t\t\t\"steps\": [\n\t\t\t\t\"recognition/text-recognition\"\n\t\t\t],\n\t\t\t\"description\": \"Recognize text in (binarized+deskewed+dewarped) lines with ocropy\",\n\t\t\t\"input_file_grp\": [\n\t\t\t\t\"OCR-D-SEG-LINE\",\n\t\t\t\t\"OCR-D-SEG-WORD\",\n\t\t\t\t\"OCR-D-SEG-GLYPH\"\n\t\t\t],\n\t\t\t\"output_file_grp\": [\n\t\t\t\t\"OCR-D-OCR-OCRO\"\n\t\t\t],\n\t\t\t\"parameters\": {\n\t\t\t\t\"textequiv_level\": {\n\t\t\t\t\t\"type\": \"string\",\n\t\t\t\t\t\"enum\": [\"line\", \"word\", \"glyph\"],\n\t\t\t\t\t\"description\": \"PAGE XML hierarchy level granularity to add the TextEquiv results to\",\n\t\t\t\t\t\"default\": \"line\"\n\t\t\t\t},\n\t\t\t\t\"model\": {\n\t\t\t\t\t\"type\": \"string\",\n\t\t\t\t\t\"description\": \"ocropy model to apply (e.g. fraktur.pyrnn)\"\n\t\t\t\t}\n\t\t\t}\n\t\t},\n\t\t\"ocrd-cis-ocropy-rec\": {\n\t\t\t\"executable\": \"ocrd-cis-ocropy-rec\",\n\t\t\t\"categories\": [\n\t\t\t\t\"Text recognition and optimization\"\n\t\t\t],\n\t\t\t\"steps\": [\n\t\t\t\t\"recognition/text-recognition\"\n\t\t\t],\n\t\t\t\"description\": \"Recognize text snippets\",\n\t\t\t\"parameters\": {\n\t\t\t\t\"model\": {\n\t\t\t\t\t\"type\": \"string\",\n\t\t\t\t\t\"description\": \"ocropy model to apply (e.g. fraktur.pyrnn)\"\n\t\t\t\t}\n\t\t\t}\n\t\t},\n\t\t\"ocrd-cis-ocropy-segment\": {\n\t\t\t\"executable\": \"ocrd-cis-ocropy-segment\",\n\t\t\t\"categories\": [\n\t\t\t\t\"Layout analysis\"\n\t\t\t],\n\t\t\t\"steps\": [\n\t\t\t\t\"layout/segmentation/region\",\n\t\t\t\t\"layout/segmentation/line\"\n\t\t\t],\n\t\t\t\"input_file_grp\": [\n\t\t\t\t\"OCR-D-GT-SEG-BLOCK\",\n\t\t\t\t\"OCR-D-SEG-BLOCK\"\n\t\t\t],\n\t\t\t\"output_file_grp\": [\n\t\t\t\t\"OCR-D-SEG-LINE\"\n\t\t\t],\n\t\t\t\"description\": \"Segment pages into regions or regions into lines with ocropy\",\n\t\t\t\"parameters\": {\n\t\t\t\t\"level-of-operation\": {\n\t\t\t\t\t\"type\": \"string\",\n\t\t\t\t\t\"enum\": [\"page\", \"region\"],\n\t\t\t\t\t\"description\": \"PAGE XML hierarchy level to read images from\",\n\t\t\t\t\t\"default\": \"region\"\n\t\t\t\t},\n\t\t\t\t\"maxcolseps\": {\n\t\t\t\t\t\"type\": \"number\",\n\t\t\t\t\t\"format\": \"integer\",\n\t\t\t\t\t\"default\": 2,\n\t\t\t\t\t\"description\": \"number of white/background column separators to try (when operating on the page level)\"\n\t\t\t\t},\n\t\t\t\t\"maxseps\": {\n\t\t\t\t\t\"type\": \"number\",\n\t\t\t\t\t\"format\": \"integer\",\n\t\t\t\t\t\"default\": 5,\n\t\t\t\t\t\"description\": \"number of black/foreground column separators to try, counted individually as lines (when operating on the page level)\"\n\t\t\t\t},\n\t\t\t\t\"overwrite_regions\": {\n\t\t\t\t\t\"type\": \"boolean\",\n\t\t\t\t\t\"default\": true,\n\t\t\t\t\t\"description\": \"remove any existing TextRegion elements (when operating on the page level)\"\n\t\t\t\t},\n\t\t\t\t\"overwrite_lines\": {\n\t\t\t\t\t\"type\": \"boolean\",\n\t\t\t\t\t\"default\": true,\n\t\t\t\t\t\"description\": \"remove any existing TextLine elements (when operating on the region level)\"\n\t\t\t\t},\n\t\t\t\t\"spread\": {\n\t\t\t\t\t\"type\": \"number\",\n\t\t\t\t\t\"format\": \"float\",\n\t\t\t\t\t\"default\": 2.4,\n\t\t\t\t\t\"description\": \"distance in points (pt) from the foreground to project text line (or text region) labels into the background\"\n\t\t\t\t}\n\t\t\t}\n\t\t},\n\t\t\"cis-ocrd-ocropy-train\": {\n\t\t\t\"executable\": \"ocrd-cis-ocropy-train\",\n\t\t\t\"categories\": [\n\t\t\t\t\"lstm ocropy model training\"\n\t\t\t],\n\t\t\t\"steps\": [\n\t\t\t\t\"training\"\n\t\t\t],\n\t\t\t\"description\": \"train model with ground truth from mets data\",\n\t\t\t\"parameters\": {\n\t\t\t\t\"textequiv_level\": {\n\t\t\t\t\t\"type\": \"string\",\n\t\t\t\t\t\"enum\": [\"line\", \"word\", \"glyph\"],\n\t\t\t\t\t\"default\": \"line\"\n\t\t\t\t},\n\t\t\t\t\"model\": {\n\t\t\t\t\t\"type\": \"string\",\n\t\t\t\t\t\"description\": \"load model or crate new one (e.g. fraktur.pyrnn)\"\n\t\t\t\t},\n\t\t\t\t\"ntrain\": {\n\t\t\t\t\t\"type\": \"integer\",\n\t\t\t\t\t\"description\": \"lines to train before stopping\",\n\t\t\t\t\t\"default\": 1000000\n\t\t\t\t},\n\t\t\t\t\"outputpath\": {\n\t\t\t\t\t\"type\": \"string\",\n\t\t\t\t\t\"description\": \"(existing) path for the trained model\"\n\t\t\t\t}\n\t\t\t}\n\t\t},\n\t\t\"ocrd-cis-profile\": {\n\t\t\t\"executable\": \"ocrd-cis-profile\",\n\t\t\t\"categories\": [\n\t\t\t\t\"Text recognition and optimization\"\n\t\t\t],\n\t\t\t\"steps\": [\n\t\t\t\t\"postprocessing/alignment\"\n\t\t\t],\n\t\t\t\"description\": \"Add a correction suggestions and suspicious tokens (profile)\",\n\t\t\t\"parameters\": {\n\t\t\t\t\"executable\": {\n\t\t\t\t\t\"type\": \"string\",\n\t\t\t\t\t\"required\": true\n\t\t\t\t},\n\t\t\t\t\"backend\": {\n\t\t\t\t\t\"type\": \"string\",\n\t\t\t\t\t\"required\": true\n\t\t\t\t},\n\t\t\t\t\"language\": {\n\t\t\t\t    \"type\": \"string\",\n\t\t\t\t\t\"required\": false,\n\t\t\t\t\t\"default\": \"german\"\n\t\t\t\t},\n\t\t\t\t\"additionalLexicon\": {\n\t\t\t\t\t\"type\": \"string\",\n\t\t\t\t\t\"required\": false,\n\t\t\t\t\t\"default\": \"\"\n\t\t\t\t}\n\t\t\t}\n\t\t},\n\t\t\"ocrd-cis-train\": {\n\t\t\t\"executable\": \"ocrd-cis-train\",\n\t\t\t\"categories\": [\n\t\t\t\t\"Text recognition and optimization\"\n\t\t\t],\n\t\t\t\"steps\": [\n\t\t\t\t\"postprocessing/alignment\"\n\t\t\t],\n\t\t\t\"description\": \"Train post correction model\",\n\t\t\t\"parameters\": {\n\t\t\t\t\"jar\": {\n\t\t\t\t\t\"type\": \"string\",\n\t\t\t\t\t\"required\": true\n\t\t\t\t}\n\t\t\t}\n\t\t},\n\t\t\"ocrd-cis-stats\": {\n\t\t\t\"executable\": \"ocrd-cis-stats\",\n\t\t\t\"categories\": [\n\t\t\t\t\"Text recognition and optimization\"\n\t\t\t],\n\t\t\t\"steps\": [\n\t\t\t\t\"postprocessing/alignment\"\n\t\t\t],\n\t\t\t\"description\": \"Get Precision of aligned ocrs\",\n\t\t\t\"parameters\": {\n\t\t\t\t\"none\": {\n\t\t\t\t\t\"type\": \"string\"\n\t\t\t\t}\n\t\t\t}\n\t\t},\n\t\t\"ocrd-cis-lang\": {\n\t\t\t\"executable\": \"ocrd-cis-lang\",\n\t\t\t\"categories\": [\n\t\t\t\t\"Text recognition and optimization\"\n\t\t\t],\n\t\t\t\"steps\": [\n\t\t\t\t\"postprocessing/alignment\"\n\t\t\t],\n\t\t\t\"description\": \"Get language and font of input-file-group\",\n\t\t\t\"parameters\": {\n\t\t\t\t\"none\": {\n\t\t\t\t\t\"type\": \"string\"\n\t\t\t\t}\n\t\t\t}\n\t\t},\n\t\t\"ocrd-cis-importer\": {\n\t\t\t\"executable\": \"ocrd-cis-importer\",\n\t\t\t\"categories\": [\n\t\t\t\t\"Text recognition and optimization\"\n\t\t\t],\n\t\t\t\"steps\": [\n\t\t\t\t\"postprocessing\"\n\t\t\t],\n\t\t\t\"description\": \"different ocropy tool\",\n\t\t\t\"parameters\": {\n\t\t\t\t\"none\": {\n\t\t\t\t\t\"type\": \"string\"\n\t\t\t\t}\n\t\t\t}\n\t\t},\n\t\t\"ocrd-cis-cutter\": {\n\t\t\t\"executable\": \"ocrd-cis-cutter\",\n\t\t\t\"categories\": [\n\t\t\t\t\"Text recognition and optimization\"\n\t\t\t],\n\t\t\t\"steps\": [\n\t\t\t\t\"postprocessing\"\n\t\t\t],\n\t\t\t\"description\": \"cut lines from input-file-groups\",\n\t\t\t\"parameters\": {\n\t\t\t\t\"gtdir\": {\n\t\t\t\t\t\"type\": \"string\"\n\t\t\t\t}\n\t\t\t}\n\t\t},\n\t\t\"ocrd-cis-clean\": {\n\t\t\t\"executable\": \"ocrd-cis-clean\",\n\t\t\t\"categories\": [\n\t\t\t\t\"Text recognition and optimization\"\n\t\t\t],\n\t\t\t\"steps\": [\n\t\t\t\t\"postprocessing\"\n\t\t\t],\n\t\t\t\"description\": \"clean-up-tool\",\n\t\t\t\"parameters\": {\n\t\t\t\t\"mainLevel\": {\n\t\t\t\t\t\"type\": \"string\",\n\t\t\t\t\t\"enum\": [\"line\", \"word\", \"glyph\"],\n\t\t\t\t\t\"default\": \"line\"\n\t\t\t\t},\n\t\t\t\t\"mainIndex\": {\n\t\t\t\t\t\"type\": \"integer\",\n\t\t\t\t\t\"description\": \"model index\"\n\t\t\t\t}\n\t\t\t}\n\t\t}\n\t}\n}\n",
+            "setup.py": "\"\"\"\nInstalls:\n    - ocrd-cis-align\n    - ocrd-cis-profile\n    - ocrd-cis-ocropy-clip\n    - ocrd-cis-ocropy-denoise\n    - ocrd-cis-ocropy-deskew\n    - ocrd-cis-ocropy-binarize\n    - ocrd-cis-ocropy-resegment\n    - ocrd-cis-ocropy-segment\n    - ocrd-cis-ocropy-dewarp\n    - ocrd-cis-ocropy-recognize\n    - ocrd-cis-ocropy-train\n    - ocrd-cis-aio\n    - ocrd-cis-stats\n    - ocrd-cis-lang\n    - ocrd-cis-clean\n    - ocrd-cis-cutter\n    - ocrd-cis-importer\n\"\"\"\n\nfrom setuptools import setup\nfrom setuptools import find_packages\n\nsetup(\n    include_package_data = True,\n    name='cis-ocrd',\n    version='0.0.4',\n    description='description',\n    long_description='long description',\n    author='Florian Fink, Tobias Englmeier, Christoph Weber',\n    author_email='finkf@cis.lmu.de, englmeier@cis.lmu.de, web_chris@msn.com',\n    url='https://github.com/cisocrgroup/cis-ocrd-py',\n    license='MIT',\n    packages=find_packages(),\n    install_requires=[\n        'ocrd>=1.0.0b19',\n        'click',\n        'scipy',\n        'numpy>=1.17.0',\n        'pillow==5.4.1',\n        'matplotlib>3.0.0',\n        'python-Levenshtein',\n        'calamari_ocr'\n    ],\n    package_data={\n        '': ['*.json', '*.yml', '*.yaml'],\n        'ocrd_cis': ['ocrd_cis/jar/ocrd-cis.jar'],\n    },\n    entry_points={\n        'console_scripts': [\n            'ocrd-cis-align=ocrd_cis.align.cli:cis_ocrd_align',\n            'ocrd-cis-profile=ocrd_cis.profile.cli:cis_ocrd_profile',\n            'ocrd-cis-ocropy-binarize=ocrd_cis.ocropy.cli:cis_ocrd_ocropy_binarize',\n            'ocrd-cis-ocropy-clip=ocrd_cis.ocropy.cli:cis_ocrd_ocropy_clip',\n            'ocrd-cis-ocropy-denoise=ocrd_cis.ocropy.cli:cis_ocrd_ocropy_denoise',\n            'ocrd-cis-ocropy-deskew=ocrd_cis.ocropy.cli:cis_ocrd_ocropy_deskew',\n            'ocrd-cis-ocropy-dewarp=ocrd_cis.ocropy.cli:cis_ocrd_ocropy_dewarp',\n            'ocrd-cis-ocropy-recognize=ocrd_cis.ocropy.cli:cis_ocrd_ocropy_recognize',\n            'ocrd-cis-ocropy-rec=ocrd_cis.ocropy.cli:cis_ocrd_ocropy_rec',\n            'ocrd-cis-ocropy-resegment=ocrd_cis.ocropy.cli:cis_ocrd_ocropy_resegment',\n            'ocrd-cis-ocropy-segment=ocrd_cis.ocropy.cli:cis_ocrd_ocropy_segment',\n            'ocrd-cis-ocropy-train=ocrd_cis.ocropy.cli:cis_ocrd_ocropy_train',\n            'ocrd-cis-aio=ocrd_cis.aio.cli:cis_ocrd_aio',\n            'ocrd-cis-stats=ocrd_cis.div.cli:cis_ocrd_stats',\n            'ocrd-cis-lang=ocrd_cis.div.cli:cis_ocrd_lang',\n            'ocrd-cis-clean=ocrd_cis.div.cli:cis_ocrd_clean',\n            'ocrd-cis-importer=ocrd_cis.div.cli:cis_ocrd_importer',\n            'ocrd-cis-cutter=ocrd_cis.div.cli:cis_ocrd_cutter',\n        ]\n    },\n)\n"
         },
         "git": {
-            "last_commit": "Thu Sep 26 17:20:39 2019 +0200",
-            "number_of_commits": "71"
+            "last_commit": "Thu Oct 24 19:20:11 2019 +0200",
+            "number_of_commits": "309"
         },
-        "name": "LAYoutERkennung",
+        "name": "ocrd_cis",
         "ocrd_tool": {
-            "git_url": "https://github.com/mjenckel/LAYoutERkennung/",
+            "git_url": "https://github.com/cisocrgroup/cis-ocrd-py",
             "tools": {
-                "ocrd-anybaseocr-binarize": {
+                "cis-ocrd-ocropy-train": {
                     "categories": [
-                        "Image preprocessing"
-                    ],
-                    "description": "Binarize images with the algorithm from ocropy",
-                    "executable": "ocrd-anybaseocr-binarize",
-                    "input_file_grp": [
-                        "OCR-D-IMG"
-                    ],
-                    "output_file_grp": [
-                        "OCR-D-IMG-BIN"
+                        "lstm ocropy model training"
                     ],
+                    "description": "train model with ground truth from mets data",
+                    "executable": "ocrd-cis-ocropy-train",
                     "parameters": {
-                        "bignore": {
-                            "default": 0.1,
-                            "description": "ignore this much of the border for threshold estimation",
-                            "format": "float",
-                            "type": "number"
-                        },
-                        "debug": {
-                            "default": 0,
-                            "description": "display intermediate results",
-                            "format": "integer",
-                            "type": "number"
-                        },
-                        "escale": {
-                            "default": 1.0,
-                            "description": "scale for estimating a mask over the text region",
-                            "format": "float",
-                            "type": "number"
-                        },
-                        "gray": {
-                            "default": false,
-                            "description": "force grayscale processing even if image seems binary",
-                            "type": "boolean"
-                        },
-                        "hi": {
-                            "default": 90,
-                            "description": "percentile for white estimation",
-                            "format": "float",
-                            "type": "number"
+                        "model": {
+                            "description": "load model or crate new one (e.g. fraktur.pyrnn)",
+                            "type": "string"
                         },
-                        "lo": {
-                            "default": 5,
-                            "description": "percentile for black estimation",
-                            "format": "float",
-                            "type": "number"
+                        "ntrain": {
+                            "default": 1000000,
+                            "description": "lines to train before stopping",
+                            "type": "integer"
                         },
-                        "nocheck": {
-                            "default": false,
-                            "description": "disable error checking on inputs",
-                            "type": "boolean"
+                        "outputpath": {
+                            "description": "(existing) path for the trained model",
+                            "type": "string"
                         },
-                        "operation_level": {
-                            "default": "page",
-                            "description": "PAGE XML hierarchy level to operate on",
+                        "textequiv_level": {
+                            "default": "line",
                             "enum": [
-                                "page",
-                                "region",
-                                "line"
+                                "line",
+                                "word",
+                                "glyph"
                             ],
                             "type": "string"
-                        },
-                        "perc": {
-                            "default": 80,
-                            "description": "percentage for filters",
-                            "format": "float",
-                            "type": "number"
-                        },
-                        "range": {
-                            "default": 20,
-                            "description": "range for filters",
-                            "format": "integer",
-                            "type": "number"
-                        },
-                        "raw_copy": {
-                            "default": false,
-                            "description": "also copy the raw image",
-                            "type": "boolean"
-                        },
-                        "show": {
-                            "default": false,
-                            "description": "display final results",
-                            "type": "boolean"
-                        },
-                        "threshold": {
-                            "default": 0.5,
-                            "description": "threshold, determines lightness",
-                            "format": "float",
-                            "type": "number"
-                        },
-                        "zoom": {
-                            "default": 0.5,
-                            "description": "zoom for page background estimation, smaller=faster",
-                            "format": "float",
-                            "type": "number"
                         }
                     },
                     "steps": [
-                        "preprocessing/optimization/binarization"
+                        "training"
                     ]
                 },
-                "ocrd-anybaseocr-block-segmentation": {
+                "ocrd-cis-aio": {
                     "categories": [
-                        "Layout analysis"
-                    ],
-                    "description": "Analysis of the input document",
-                    "executable": "ocrd-anybaseocr-block-segmentation",
-                    "input_file_grp": [
-                        "OCR-D-IMG"
-                    ],
-                    "output_file_grp": [
-                        "OCR-D-BLOCK-SEGMENT"
+                        "Text recognition and optimization"
                     ],
+                    "description": "All in One Tool",
+                    "executable": "ocrd-cis-aio",
                     "parameters": {
-                        "block_segmentation_model": {
-                            "description": "Path to Layout Structure Classification Model",
+                        "alignparampath": {
                             "required": true,
                             "type": "string"
                         },
-                        "block_segmentation_weights": {
-                            "description": "Path to Layout Structure Classes",
+                        "ocropyparampath1": {
+                            "required": true,
+                            "type": "string"
+                        },
+                        "ocropyparampath2": {
+                            "required": true,
+                            "type": "string"
+                        },
+                        "tesserparampath": {
                             "required": true,
                             "type": "string"
                         }
                     },
                     "steps": [
-                        "layout/segmentation/text-image"
+                        "postprocessing/alignment/recognition"
                     ]
                 },
-                "ocrd-anybaseocr-crop": {
+                "ocrd-cis-align": {
                     "categories": [
-                        "Image preprocessing"
-                    ],
-                    "description": "Image crop using non-linear processing",
-                    "executable": "ocrd-anybaseocr-crop",
-                    "input_file_grp": [
-                        "OCR-D-IMG-DESKEW"
+                        "Text recognition and optimization"
                     ],
-                    "output_file_grp": [
-                        "OCR-D-IMG-CROP"
+                    "description": "Align multiple OCRs and/or GTs",
+                    "executable": "ocrd-cis-align",
+                    "steps": [
+                        "postprocessing/alignment"
+                    ]
+                },
+                "ocrd-cis-clean": {
+                    "categories": [
+                        "Text recognition and optimization"
                     ],
+                    "description": "clean-up-tool",
+                    "executable": "ocrd-cis-clean",
                     "parameters": {
-                        "colSeparator": {
-                            "default": 0.04,
-                            "description": "consider space between column. 25% of width",
-                            "format": "float",
-                            "type": "number"
-                        },
-                        "maxRularArea": {
-                            "default": 0.3,
-                            "description": "Consider maximum rular area",
-                            "format": "float",
-                            "type": "number"
-                        },
-                        "minArea": {
-                            "default": 0.05,
-                            "description": "rular position in below",
-                            "format": "float",
-                            "type": "number"
-                        },
-                        "minRularArea": {
-                            "default": 0.01,
-                            "description": "Consider minimum rular area",
-                            "format": "float",
-                            "type": "number"
+                        "mainIndex": {
+                            "description": "model index",
+                            "type": "integer"
                         },
-                        "operation_level": {
+                        "mainLevel": {
+                            "default": "line",
+                            "enum": [
+                                "line",
+                                "word",
+                                "glyph"
+                            ],
+                            "type": "string"
+                        }
+                    },
+                    "steps": [
+                        "postprocessing"
+                    ]
+                },
+                "ocrd-cis-cutter": {
+                    "categories": [
+                        "Text recognition and optimization"
+                    ],
+                    "description": "cut lines from input-file-groups",
+                    "executable": "ocrd-cis-cutter",
+                    "parameters": {
+                        "gtdir": {
+                            "type": "string"
+                        }
+                    },
+                    "steps": [
+                        "postprocessing"
+                    ]
+                },
+                "ocrd-cis-importer": {
+                    "categories": [
+                        "Text recognition and optimization"
+                    ],
+                    "description": "different ocropy tool",
+                    "executable": "ocrd-cis-importer",
+                    "parameters": {
+                        "none": {
+                            "type": "string"
+                        }
+                    },
+                    "steps": [
+                        "postprocessing"
+                    ]
+                },
+                "ocrd-cis-lang": {
+                    "categories": [
+                        "Text recognition and optimization"
+                    ],
+                    "description": "Get language and font of input-file-group",
+                    "executable": "ocrd-cis-lang",
+                    "parameters": {
+                        "none": {
+                            "type": "string"
+                        }
+                    },
+                    "steps": [
+                        "postprocessing/alignment"
+                    ]
+                },
+                "ocrd-cis-ocropy-binarize": {
+                    "categories": [
+                        "Image preprocessing"
+                    ],
+                    "description": "Binarize (and optionally deskew/despeckle) pages / regions / lines with ocropy",
+                    "executable": "ocrd-cis-ocropy-binarize",
+                    "input_file_grp": [
+                        "OCR-D-IMG",
+                        "OCR-D-SEG-BLOCK",
+                        "OCR-D-SEG-LINE"
+                    ],
+                    "output_file_grp": [
+                        "OCR-D-IMG-BIN",
+                        "OCR-D-SEG-BLOCK",
+                        "OCR-D-SEG-LINE"
+                    ],
+                    "parameters": {
+                        "grayscale": {
+                            "default": false,
+                            "description": "for the ocropy method, produce grayscale-normalized instead of thresholded image",
+                            "type": "boolean"
+                        },
+                        "level-of-operation": {
+                            "default": "page",
+                            "description": "PAGE XML hierarchy level granularity to annotate images for",
+                            "enum": [
+                                "page",
+                                "region",
+                                "line"
+                            ],
+                            "type": "string"
+                        },
+                        "maxskew": {
+                            "default": 0.0,
+                            "description": "modulus of maximum skewing angle to detect (larger will be slower, 0 will deactivate deskewing)",
+                            "type": "number"
+                        },
+                        "method": {
+                            "default": "ocropy",
+                            "description": "binarization method to use (only ocropy will include deskewing)",
+                            "enum": [
+                                "none",
+                                "global",
+                                "otsu",
+                                "gauss-otsu",
+                                "ocropy"
+                            ],
+                            "type": "string"
+                        },
+                        "noise_maxsize": {
+                            "default": 0,
+                            "description": "maximum pixel number for connected components to regard as noise (0 will deactivate denoising)",
+                            "type": "number"
+                        }
+                    },
+                    "steps": [
+                        "preprocessing/optimization/binarization",
+                        "preprocessing/optimization/grayscale_normalization",
+                        "preprocessing/optimization/deskewing"
+                    ]
+                },
+                "ocrd-cis-ocropy-clip": {
+                    "categories": [
+                        "Layout analysis"
+                    ],
+                    "description": "Clip text regions / lines at intersections with neighbours",
+                    "executable": "ocrd-cis-ocropy-clip",
+                    "input_file_grp": [
+                        "OCR-D-SEG-BLOCK",
+                        "OCR-D-SEG-LINE"
+                    ],
+                    "output_file_grp": [
+                        "OCR-D-SEG-BLOCK",
+                        "OCR-D-SEG-LINE"
+                    ],
+                    "parameters": {
+                        "level-of-operation": {
+                            "default": "region",
+                            "description": "PAGE XML hierarchy level granularity to annotate images for",
+                            "enum": [
+                                "region",
+                                "line"
+                            ],
+                            "type": "string"
+                        },
+                        "min_fraction": {
+                            "default": 0.7,
+                            "description": "share of foreground pixels that must be retained by the largest label",
+                            "format": "float",
+                            "type": "number"
+                        }
+                    },
+                    "steps": [
+                        "layout/segmentation/region",
+                        "layout/segmentation/line"
+                    ]
+                },
+                "ocrd-cis-ocropy-denoise": {
+                    "categories": [
+                        "Image preprocessing"
+                    ],
+                    "description": "Despeckle pages / regions / lines with ocropy",
+                    "executable": "ocrd-cis-ocropy-denoise",
+                    "input_file_grp": [
+                        "OCR-D-IMG",
+                        "OCR-D-SEG-BLOCK",
+                        "OCR-D-SEG-LINE"
+                    ],
+                    "output_file_grp": [
+                        "OCR-D-IMG-DESPECK",
+                        "OCR-D-SEG-BLOCK",
+                        "OCR-D-SEG-LINE"
+                    ],
+                    "parameters": {
+                        "level-of-operation": {
+                            "default": "page",
+                            "description": "PAGE XML hierarchy level granularity to annotate images for",
+                            "enum": [
+                                "page",
+                                "region",
+                                "line"
+                            ],
+                            "type": "string"
+                        },
+                        "noise_maxsize": {
+                            "default": 2,
+                            "description": "maximum pixel number for connected components to regard as noise (0 will deactivate denoising)",
+                            "type": "number"
+                        }
+                    },
+                    "steps": [
+                        "preprocessing/optimization/despeckling"
+                    ]
+                },
+                "ocrd-cis-ocropy-deskew": {
+                    "categories": [
+                        "Image preprocessing"
+                    ],
+                    "description": "Deskew regions with ocropy (by annotating orientation angle and adding AlternativeImage)",
+                    "executable": "ocrd-cis-ocropy-deskew",
+                    "input_file_grp": [
+                        "OCR-D-SEG-BLOCK",
+                        "OCR-D-SEG-LINE"
+                    ],
+                    "output_file_grp": [
+                        "OCR-D-SEG-BLOCK",
+                        "OCR-D-SEG-LINE"
+                    ],
+                    "parameters": {
+                        "level-of-operation": {
+                            "default": "region",
+                            "description": "PAGE XML hierarchy level granularity to annotate images for",
+                            "enum": [
+                                "page",
+                                "region"
+                            ],
+                            "type": "string"
+                        },
+                        "maxskew": {
+                            "default": 5.0,
+                            "description": "modulus of maximum skewing angle to detect (larger will be slower, 0 will deactivate deskewing)",
+                            "type": "number"
+                        }
+                    },
+                    "steps": [
+                        "preprocessing/optimization/deskewing"
+                    ]
+                },
+                "ocrd-cis-ocropy-dewarp": {
+                    "categories": [
+                        "Image preprocessing"
+                    ],
+                    "description": "Dewarp line images with ocropy",
+                    "executable": "ocrd-cis-ocropy-dewarp",
+                    "input_file_grp": [
+                        "OCR-D-SEG-LINE"
+                    ],
+                    "output_file_grp": [
+                        "OCR-D-SEG-LINE"
+                    ],
+                    "parameters": {
+                        "range": {
+                            "default": 4,
+                            "description": "maximum vertical disposition or maximum margin (will be multiplied by mean centerline deltas to yield pixels)",
+                            "type": "number"
+                        }
+                    },
+                    "steps": [
+                        "preprocessing/optimization/dewarping"
+                    ]
+                },
+                "ocrd-cis-ocropy-rec": {
+                    "categories": [
+                        "Text recognition and optimization"
+                    ],
+                    "description": "Recognize text snippets",
+                    "executable": "ocrd-cis-ocropy-rec",
+                    "parameters": {
+                        "model": {
+                            "description": "ocropy model to apply (e.g. fraktur.pyrnn)",
+                            "type": "string"
+                        }
+                    },
+                    "steps": [
+                        "recognition/text-recognition"
+                    ]
+                },
+                "ocrd-cis-ocropy-recognize": {
+                    "categories": [
+                        "Text recognition and optimization"
+                    ],
+                    "description": "Recognize text in (binarized+deskewed+dewarped) lines with ocropy",
+                    "executable": "ocrd-cis-ocropy-recognize",
+                    "input_file_grp": [
+                        "OCR-D-SEG-LINE",
+                        "OCR-D-SEG-WORD",
+                        "OCR-D-SEG-GLYPH"
+                    ],
+                    "output_file_grp": [
+                        "OCR-D-OCR-OCRO"
+                    ],
+                    "parameters": {
+                        "model": {
+                            "description": "ocropy model to apply (e.g. fraktur.pyrnn)",
+                            "type": "string"
+                        },
+                        "textequiv_level": {
+                            "default": "line",
+                            "description": "PAGE XML hierarchy level granularity to add the TextEquiv results to",
+                            "enum": [
+                                "line",
+                                "word",
+                                "glyph"
+                            ],
+                            "type": "string"
+                        }
+                    },
+                    "steps": [
+                        "recognition/text-recognition"
+                    ]
+                },
+                "ocrd-cis-ocropy-resegment": {
+                    "categories": [
+                        "Layout analysis"
+                    ],
+                    "description": "Resegment lines with ocropy (by shrinking annotated polygons)",
+                    "executable": "ocrd-cis-ocropy-resegment",
+                    "input_file_grp": [
+                        "OCR-D-SEG-LINE"
+                    ],
+                    "output_file_grp": [
+                        "OCR-D-SEG-LINE"
+                    ],
+                    "parameters": {
+                        "extend_margins": {
+                            "default": 3,
+                            "description": "number of pixels to extend the input polygons horizontally and vertically before intersecting",
+                            "format": "integer",
+                            "type": "number"
+                        },
+                        "min_fraction": {
+                            "default": 0.8,
+                            "description": "share of foreground pixels that must be retained by the largest label",
+                            "format": "float",
+                            "type": "number"
+                        }
+                    },
+                    "steps": [
+                        "layout/segmentation/line"
+                    ]
+                },
+                "ocrd-cis-ocropy-segment": {
+                    "categories": [
+                        "Layout analysis"
+                    ],
+                    "description": "Segment pages into regions or regions into lines with ocropy",
+                    "executable": "ocrd-cis-ocropy-segment",
+                    "input_file_grp": [
+                        "OCR-D-GT-SEG-BLOCK",
+                        "OCR-D-SEG-BLOCK"
+                    ],
+                    "output_file_grp": [
+                        "OCR-D-SEG-LINE"
+                    ],
+                    "parameters": {
+                        "level-of-operation": {
+                            "default": "region",
+                            "description": "PAGE XML hierarchy level to read images from",
+                            "enum": [
+                                "page",
+                                "region"
+                            ],
+                            "type": "string"
+                        },
+                        "maxcolseps": {
+                            "default": 2,
+                            "description": "number of white/background column separators to try (when operating on the page level)",
+                            "format": "integer",
+                            "type": "number"
+                        },
+                        "maxseps": {
+                            "default": 5,
+                            "description": "number of black/foreground column separators to try, counted individually as lines (when operating on the page level)",
+                            "format": "integer",
+                            "type": "number"
+                        },
+                        "overwrite_lines": {
+                            "default": true,
+                            "description": "remove any existing TextLine elements (when operating on the region level)",
+                            "type": "boolean"
+                        },
+                        "overwrite_regions": {
+                            "default": true,
+                            "description": "remove any existing TextRegion elements (when operating on the page level)",
+                            "type": "boolean"
+                        },
+                        "spread": {
+                            "default": 2.4,
+                            "description": "distance in points (pt) from the foreground to project text line (or text region) labels into the background",
+                            "format": "float",
+                            "type": "number"
+                        }
+                    },
+                    "steps": [
+                        "layout/segmentation/region",
+                        "layout/segmentation/line"
+                    ]
+                },
+                "ocrd-cis-profile": {
+                    "categories": [
+                        "Text recognition and optimization"
+                    ],
+                    "description": "Add a correction suggestions and suspicious tokens (profile)",
+                    "executable": "ocrd-cis-profile",
+                    "parameters": {
+                        "additionalLexicon": {
+                            "default": "",
+                            "required": false,
+                            "type": "string"
+                        },
+                        "backend": {
+                            "required": true,
+                            "type": "string"
+                        },
+                        "executable": {
+                            "required": true,
+                            "type": "string"
+                        },
+                        "language": {
+                            "default": "german",
+                            "required": false,
+                            "type": "string"
+                        }
+                    },
+                    "steps": [
+                        "postprocessing/alignment"
+                    ]
+                },
+                "ocrd-cis-stats": {
+                    "categories": [
+                        "Text recognition and optimization"
+                    ],
+                    "description": "Get Precision of aligned ocrs",
+                    "executable": "ocrd-cis-stats",
+                    "parameters": {
+                        "none": {
+                            "type": "string"
+                        }
+                    },
+                    "steps": [
+                        "postprocessing/alignment"
+                    ]
+                },
+                "ocrd-cis-train": {
+                    "categories": [
+                        "Text recognition and optimization"
+                    ],
+                    "description": "Train post correction model",
+                    "executable": "ocrd-cis-train",
+                    "parameters": {
+                        "jar": {
+                            "required": true,
+                            "type": "string"
+                        }
+                    },
+                    "steps": [
+                        "postprocessing/alignment"
+                    ]
+                }
+            },
+            "version": "0.0.1"
+        },
+        "ocrd_tool_validate": "<report valid=\"false\">\n  <error>[tools.ocrd-cis-aio] 'input_file_grp' is a required property</error>\n  <error>[tools.ocrd-cis-aio.parameters.tesserparampath] 'description' is a required property</error>\n  <error>[tools.ocrd-cis-aio.parameters.ocropyparampath1] 'description' is a required property</error>\n  <error>[tools.ocrd-cis-aio.parameters.ocropyparampath2] 'description' is a required property</error>\n  <error>[tools.ocrd-cis-aio.parameters.alignparampath] 'description' is a required property</error>\n  <error>[tools.ocrd-cis-aio.steps.0] 'postprocessing/alignment/recognition' is not one of ['preprocessing/characterization', 'preprocessing/optimization', 'preprocessing/optimization/cropping', 'preprocessing/optimization/deskewing', 'preprocessing/optimization/despeckling', 'preprocessing/optimization/dewarping', 'preprocessing/optimization/binarization', 'preprocessing/optimization/grayscale_normalization', 'recognition/text-recognition', 'recognition/font-identification', 'recognition/post-correction', 'layout/segmentation', 'layout/segmentation/text-nontext', 'layout/segmentation/region', 'layout/segmentation/line', 'layout/segmentation/word', 'layout/segmentation/classification', 'layout/analysis']</error>\n  <error>[tools.ocrd-cis-align] 'input_file_grp' is a required property</error>\n  <error>[tools.ocrd-cis-align.steps.0] 'postprocessing/alignment' is not one of ['preprocessing/characterization', 'preprocessing/optimization', 'preprocessing/optimization/cropping', 'preprocessing/optimization/deskewing', 'preprocessing/optimization/despeckling', 'preprocessing/optimization/dewarping', 'preprocessing/optimization/binarization', 'preprocessing/optimization/grayscale_normalization', 'recognition/text-recognition', 'recognition/font-identification', 'recognition/post-correction', 'layout/segmentation', 'layout/segmentation/text-nontext', 'layout/segmentation/region', 'layout/segmentation/line', 'layout/segmentation/word', 'layout/segmentation/classification', 'layout/analysis']</error>\n  <error>[tools.ocrd-cis-ocropy-rec] 'input_file_grp' is a required property</error>\n  <error>[tools.cis-ocrd-ocropy-train] 'input_file_grp' is a required property</error>\n  <error>[tools.cis-ocrd-ocropy-train.parameters.textequiv_level] 'description' is a required property</error>\n  <error>[tools.cis-ocrd-ocropy-train.parameters.ntrain.type] 'integer' is not one of ['string', 'number', 'boolean']</error>\n  <error>[tools.cis-ocrd-ocropy-train.categories.0] 'lstm ocropy model training' is not one of ['Image preprocessing', 'Layout analysis', 'Text recognition and optimization', 'Model training', 'Long-term preservation', 'Quality assurance']</error>\n  <error>[tools.cis-ocrd-ocropy-train.steps.0] 'training' is not one of ['preprocessing/characterization', 'preprocessing/optimization', 'preprocessing/optimization/cropping', 'preprocessing/optimization/deskewing', 'preprocessing/optimization/despeckling', 'preprocessing/optimization/dewarping', 'preprocessing/optimization/binarization', 'preprocessing/optimization/grayscale_normalization', 'recognition/text-recognition', 'recognition/font-identification', 'recognition/post-correction', 'layout/segmentation', 'layout/segmentation/text-nontext', 'layout/segmentation/region', 'layout/segmentation/line', 'layout/segmentation/word', 'layout/segmentation/classification', 'layout/analysis']</error>\n  <error>[tools.ocrd-cis-profile] 'input_file_grp' is a required property</error>\n  <error>[tools.ocrd-cis-profile.parameters.executable] 'description' is a required property</error>\n  <error>[tools.ocrd-cis-profile.parameters.backend] 'description' is a required property</error>\n  <error>[tools.ocrd-cis-profile.parameters.language] 'description' is a required property</error>\n  <error>[tools.ocrd-cis-profile.parameters.additionalLexicon] 'description' is a required property</error>\n  <error>[tools.ocrd-cis-profile.steps.0] 'postprocessing/alignment' is not one of ['preprocessing/characterization', 'preprocessing/optimization', 'preprocessing/optimization/cropping', 'preprocessing/optimization/deskewing', 'preprocessing/optimization/despeckling', 'preprocessing/optimization/dewarping', 'preprocessing/optimization/binarization', 'preprocessing/optimization/grayscale_normalization', 'recognition/text-recognition', 'recognition/font-identification', 'recognition/post-correction', 'layout/segmentation', 'layout/segmentation/text-nontext', 'layout/segmentation/region', 'layout/segmentation/line', 'layout/segmentation/word', 'layout/segmentation/classification', 'layout/analysis']</error>\n  <error>[tools.ocrd-cis-train] 'input_file_grp' is a required property</error>\n  <error>[tools.ocrd-cis-train.parameters.jar] 'description' is a required property</error>\n  <error>[tools.ocrd-cis-train.steps.0] 'postprocessing/alignment' is not one of ['preprocessing/characterization', 'preprocessing/optimization', 'preprocessing/optimization/cropping', 'preprocessing/optimization/deskewing', 'preprocessing/optimization/despeckling', 'preprocessing/optimization/dewarping', 'preprocessing/optimization/binarization', 'preprocessing/optimization/grayscale_normalization', 'recognition/text-recognition', 'recognition/font-identification', 'recognition/post-correction', 'layout/segmentation', 'layout/segmentation/text-nontext', 'layout/segmentation/region', 'layout/segmentation/line', 'layout/segmentation/word', 'layout/segmentation/classification', 'layout/analysis']</error>\n  <error>[tools.ocrd-cis-stats] 'input_file_grp' is a required property</error>\n  <error>[tools.ocrd-cis-stats.parameters.none] 'description' is a required property</error>\n  <error>[tools.ocrd-cis-stats.steps.0] 'postprocessing/alignment' is not one of ['preprocessing/characterization', 'preprocessing/optimization', 'preprocessing/optimization/cropping', 'preprocessing/optimization/deskewing', 'preprocessing/optimization/despeckling', 'preprocessing/optimization/dewarping', 'preprocessing/optimization/binarization', 'preprocessing/optimization/grayscale_normalization', 'recognition/text-recognition', 'recognition/font-identification', 'recognition/post-correction', 'layout/segmentation', 'layout/segmentation/text-nontext', 'layout/segmentation/region', 'layout/segmentation/line', 'layout/segmentation/word', 'layout/segmentation/classification', 'layout/analysis']</error>\n  <error>[tools.ocrd-cis-lang] 'input_file_grp' is a required property</error>\n  <error>[tools.ocrd-cis-lang.parameters.none] 'description' is a required property</error>\n  <error>[tools.ocrd-cis-lang.steps.0] 'postprocessing/alignment' is not one of ['preprocessing/characterization', 'preprocessing/optimization', 'preprocessing/optimization/cropping', 'preprocessing/optimization/deskewing', 'preprocessing/optimization/despeckling', 'preprocessing/optimization/dewarping', 'preprocessing/optimization/binarization', 'preprocessing/optimization/grayscale_normalization', 'recognition/text-recognition', 'recognition/font-identification', 'recognition/post-correction', 'layout/segmentation', 'layout/segmentation/text-nontext', 'layout/segmentation/region', 'layout/segmentation/line', 'layout/segmentation/word', 'layout/segmentation/classification', 'layout/analysis']</error>\n  <error>[tools.ocrd-cis-importer] 'input_file_grp' is a required property</error>\n  <error>[tools.ocrd-cis-importer.parameters.none] 'description' is a required property</error>\n  <error>[tools.ocrd-cis-importer.steps.0] 'postprocessing' is not one of ['preprocessing/characterization', 'preprocessing/optimization', 'preprocessing/optimization/cropping', 'preprocessing/optimization/deskewing', 'preprocessing/optimization/despeckling', 'preprocessing/optimization/dewarping', 'preprocessing/optimization/binarization', 'preprocessing/optimization/grayscale_normalization', 'recognition/text-recognition', 'recognition/font-identification', 'recognition/post-correction', 'layout/segmentation', 'layout/segmentation/text-nontext', 'layout/segmentation/region', 'layout/segmentation/line', 'layout/segmentation/word', 'layout/segmentation/classification', 'layout/analysis']</error>\n  <error>[tools.ocrd-cis-cutter] 'input_file_grp' is a required property</error>\n  <error>[tools.ocrd-cis-cutter.parameters.gtdir] 'description' is a required property</error>\n  <error>[tools.ocrd-cis-cutter.steps.0] 'postprocessing' is not one of ['preprocessing/characterization', 'preprocessing/optimization', 'preprocessing/optimization/cropping', 'preprocessing/optimization/deskewing', 'preprocessing/optimization/despeckling', 'preprocessing/optimization/dewarping', 'preprocessing/optimization/binarization', 'preprocessing/optimization/grayscale_normalization', 'recognition/text-recognition', 'recognition/font-identification', 'recognition/post-correction', 'layout/segmentation', 'layout/segmentation/text-nontext', 'layout/segmentation/region', 'layout/segmentation/line', 'layout/segmentation/word', 'layout/segmentation/classification', 'layout/analysis']</error>\n  <error>[tools.ocrd-cis-clean] 'input_file_grp' is a required property</error>\n  <error>[tools.ocrd-cis-clean.parameters.mainLevel] 'description' is a required property</error>\n  <error>[tools.ocrd-cis-clean.parameters.mainIndex.type] 'integer' is not one of ['string', 'number', 'boolean']</error>\n  <error>[tools.ocrd-cis-clean.steps.0] 'postprocessing' is not one of ['preprocessing/characterization', 'preprocessing/optimization', 'preprocessing/optimization/cropping', 'preprocessing/optimization/deskewing', 'preprocessing/optimization/despeckling', 'preprocessing/optimization/dewarping', 'preprocessing/optimization/binarization', 'preprocessing/optimization/grayscale_normalization', 'recognition/text-recognition', 'recognition/font-identification', 'recognition/post-correction', 'layout/segmentation', 'layout/segmentation/text-nontext', 'layout/segmentation/region', 'layout/segmentation/line', 'layout/segmentation/word', 'layout/segmentation/classification', 'layout/analysis']</error>\n</report>",
+        "org_plus_name": "cisocrgroup/ocrd_cis",
+        "python": {
+            "author": "Florian Fink, Tobias Englmeier, Christoph Weber",
+            "author-email": "finkf@cis.lmu.de, englmeier@cis.lmu.de, web_chris@msn.com",
+            "name": "cis-ocrd",
+            "url": "https://github.com/cisocrgroup/cis-ocrd-py"
+        },
+        "url": "https://github.com/cisocrgroup/ocrd_cis"
+    },
+    {
+        "files": {
+            "Dockerfile": null,
+            "README.md": "# Document Preprocessing and Segmentation\n\n[![CircleCI](https://circleci.com/gh/mjenckel/OCR-D-LAYoutERkennung.svg?style=svg)](https://circleci.com/gh/mjenckel/OCR-D-LAYoutERkennung)\n\n> Tools for preprocessing scanned images for OCR\n\n# Installing\n\nTo install anyBaseOCR dependencies system-wide:\n\n    $ sudo pip install .\n\nAlternatively, dependencies can be installed into a Virtual Environment:\n\n    $ virtualenv venv\n    $ source venv/bin/activate\n    $ pip install -e .\n\n## Tools included\n\nTo see how to run binarization, deskew, crop and dewarp, text/non-text segmentation and textline segmentation methods, please follow corresponding below files for a detailed description :\n\n   * [README_binarize.md](https://github.com/mjenckel/OCR-D-LAYoutERkennung/tree/master/docs/README_binarize.md) instruction for binarization method\n   * [README_deskew.md](https://github.com/mjenckel/OCR-D-LAYoutERkennung/tree/master/docs/README_deskew.md) instruction for deskew method\n   * [README_cropping.md](https://github.com/mjenckel/OCR-D-LAYoutERkennung/tree/master/docs/README_cropping.md) instruction for cropping method\n   * [README_dewarp.md](https://github.com/mjenckel/OCR-D-LAYoutERkennung/tree/master/docs/README_dewarp.md) instruction for dewarp method\n   * [README_tiseg.md](https://github.com/mjenckel/OCR-D-LAYoutERkennung/tree/master/docs/README_tigseg.md) instruction for text/non-text segmentation method\n   * [README_textline.md](https://github.com/mjenckel/OCR-D-LAYoutERkennung/tree/master/docs/README_textline.md) instruction for textline segmentation method\n\n## Binarizer\n\n### Method Behaviour \n This function takes a scanned colored /gray scale document image as input and do the black and white binarize image.\n \n #### Usage:\n```sh\nocrd-anybaseocr-binarize -m (path to METs input file) -I (Input group name) -O (Output group name) [-p (path to parameter file) -o (METs output filename)]\n```\n\n#### Example: \n```sh\nocrd-anybaseocr-binarize \\\n   -m mets.xml \\\n   -I OCR-D-IMG \\\n   -O OCR-D-IMG-BIN\n```\n\n## Deskewer\n\n### Method Behaviour \n This function takes a document image as input and do the skew correction of that document.\n \n #### Usage:\n```sh\nocrd-anybaseocr-deskew -m (path to METs input file) -I (Input group name) -O (Output group name) [-p (path to parameter file) -o (METs output filename)]\n```\n\n#### Example: \n```sh\nocrd-anybaseocr-deskew \\\n  -m mets.xml \\\n  -I OCR-D-IMG-BIN \\\n  -O OCR-D-IMG-DESKEW\n```\n\n## Cropper\n\n### Method Behaviour \n This function takes a document image as input and crops/selects the page content area only (that's mean remove textual noise as well as any other noise around page content area)\n \n #### Usage:\n```sh\nocrd-anybaseocr-cropping -m (path to METs input file) -I (Input group name) -O (Output group name) [-p (path to parameter file) -o (METs output filename)]\n```\n\n#### Example: \n```sh\nocrd-anybaseocr-cropping \\\n   -m mets.xml \\\n   -I OCR-D-IMG-DESKEW \\\n   -O OCR-D-IMG-CROP\n```\n\n\n## Dewarper\n\n### Method Behaviour \n This function takes a document image as input and make the text line straight if its curved.\n \n #### Usage:\n```sh\nocrd-anybaseocr-dewarp -m (path to METs input file) -I (Input group name) -O (Output group name) [-p (path to parameter file) -o (METs output filename)]\n```\n\n\n#### Example: \n```sh\nCUDA_VISIBLE_DEVICES=0 ocrd-anybaseocr-dewarp \\\n   -m mets.xml \\\n   -I OCR-D-IMG-CROP \\\n   -O OCR-D-IMG-DEWARP\n   -p params.json \n```\n\n## Text/Non-Text Segmenter\n\n### Method Behaviour \n This function takes a document image as an input and separates the text and non-text part from the input document image.\n \n #### Usage:\n```sh\nocrd-anybaseocr-tiseg -m (path to METs input file) -I (Input group name) -O (Output group name) [-p (path to parameter file) -o (METs output filename)]\n```\n\n#### Example: \n```sh\nocrd-anybaseocr-tiseg \\\n\t-m mets.xml \\\n\t-I OCR-D-IMG-CROP \\\n\t-O OCR-D-IMG-TISEG\n```\n\n## Textline Segmenter\n\n### Method Behaviour \n This function takes a cropped document image as an input and segment the image into textline images.\n \n #### Usage:\n```sh\nocrd-anybaseocr-textline -m (path to METs input file) -I (Input group name) -O (Output group name) [-p (path to parameter file) -o (METs output filename)]\n```\n\n#### Example: \n```sh\nocrd-anybaseocr-textline \\\n\t-m mets.xml \\\n\t-I OCR-D-IMG-TISEG \\\n\t-O OCR-D-IMG-TL\n```\n\n\n## Testing\n\nTo test the tools, download [OCR-D/assets](https://github.com/OCR-D/assets). In\nparticular, the code is tested with the\n[dfki-testdata](https://github.com/OCR-D/assets/tree/master/data/dfki-testdata)\ndataset.\n\nRun `make test` to run all tests.\n\n## License\n\n\n```\n Licensed under the Apache License, Version 2.0 (the \"License\");\n you may not use this file except in compliance with the License.\n You may obtain a copy of the License at\n\n     http://www.apache.org/licenses/LICENSE-2.0\n\n Unless required by applicable law or agreed to in writing, software\n distributed under the License is distributed on an \"AS IS\" BASIS,\n WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n See the License for the specific language governing permissions and\n limitations under the License.\n ```\n",
+            "ocrd-tool.json": "{\n  \"git_url\": \"https://github.com/mjenckel/LAYoutERkennung/\",\n  \"version\": \"0.0.1\",\n  \"tools\": {\n    \"ocrd-anybaseocr-binarize\": {\n      \"executable\": \"ocrd-anybaseocr-binarize\",\n      \"description\": \"Binarize images with the algorithm from ocropy\",\n      \"categories\": [\"Image preprocessing\"],\n      \"steps\": [\"preprocessing/optimization/binarization\"],\n      \"input_file_grp\": [\"OCR-D-IMG\"],\n      \"output_file_grp\": [\"OCR-D-IMG-BIN\"],\n      \"parameters\": {\n        \"nocheck\":         {\"type\": \"boolean\",                     \"default\": false, \"description\": \"disable error checking on inputs\"},\n        \"show\":            {\"type\": \"boolean\",                     \"default\": false, \"description\": \"display final results\"},\n        \"raw_copy\":        {\"type\": \"boolean\",                     \"default\": false, \"description\": \"also copy the raw image\"},\n        \"gray\":            {\"type\": \"boolean\",                     \"default\": false, \"description\": \"force grayscale processing even if image seems binary\"},\n        \"bignore\":         {\"type\": \"number\", \"format\": \"float\",   \"default\": 0.1,   \"description\": \"ignore this much of the border for threshold estimation\"},\n        \"debug\":           {\"type\": \"number\", \"format\": \"integer\", \"default\": 0,     \"description\": \"display intermediate results\"},\n        \"escale\":          {\"type\": \"number\", \"format\": \"float\",   \"default\": 1.0,   \"description\": \"scale for estimating a mask over the text region\"},\n        \"hi\":              {\"type\": \"number\", \"format\": \"float\",   \"default\": 90,    \"description\": \"percentile for white estimation\"},\n        \"lo\":              {\"type\": \"number\", \"format\": \"float\",   \"default\": 5,     \"description\": \"percentile for black estimation\"},\n        \"perc\":            {\"type\": \"number\", \"format\": \"float\",   \"default\": 80,    \"description\": \"percentage for filters\"},\n        \"range\":           {\"type\": \"number\", \"format\": \"integer\", \"default\": 20,    \"description\": \"range for filters\"},\n        \"threshold\":       {\"type\": \"number\", \"format\": \"float\",   \"default\": 0.5,   \"description\": \"threshold, determines lightness\"},\n        \"zoom\":            {\"type\": \"number\", \"format\": \"float\",   \"default\": 0.5,   \"description\": \"zoom for page background estimation, smaller=faster\"},\n        \"operation_level\": {\"type\": \"string\", \"enum\": [\"page\",\"region\", \"line\"], \"default\": \"page\",\"description\": \"PAGE XML hierarchy level to operate on\"}\n      }\n    },\n    \"ocrd-anybaseocr-deskew\": {\n      \"executable\": \"ocrd-anybaseocr-deskew\",\n      \"description\": \"Deskew images with the algorithm from ocropy\",\n      \"categories\": [\"Image preprocessing\"],\n      \"steps\": [\"preprocessing/optimization/deskewing\"],\n      \"input_file_grp\": [\"OCR-D-IMG-BIN\"],\n      \"output_file_grp\": [\"OCR-D-IMG-DESKEW\"],\n      \"parameters\": {\n        \"escale\":    {\"type\": \"number\", \"format\": \"float\",   \"default\": 1.0, \"description\": \"scale for estimating a mask over the text region\"},\n        \"bignore\":   {\"type\": \"number\", \"format\": \"float\",   \"default\": 0.1, \"description\": \"ignore this much of the border for threshold estimation\"},\n        \"threshold\": {\"type\": \"number\", \"format\": \"float\",   \"default\": 0.5, \"description\": \"threshold, determines lightness\"},\n        \"maxskew\":   {\"type\": \"number\", \"format\": \"float\",   \"default\": 1.0, \"description\": \"skew angle estimation parameters (degrees)\"},\n        \"skewsteps\": {\"type\": \"number\", \"format\": \"integer\", \"default\": 8,   \"description\": \"steps for skew angle estimation (per degree)\"},\n        \"debug\":     {\"type\": \"number\", \"format\": \"integer\", \"default\": 0,   \"description\": \"display intermediate results\"},\n        \"parallel\":  {\"type\": \"number\", \"format\": \"integer\", \"default\": 0,   \"description\": \"???\"},\n        \"lo\":        {\"type\": \"number\", \"format\": \"integer\", \"default\": 5,   \"description\": \"percentile for black estimation\"},\n        \"hi\":        {\"type\": \"number\", \"format\": \"integer\", \"default\": 90,   \"description\": \"percentile for white estimation\"},\n        \"operation_level\": {\"type\": \"string\", \"enum\": [\"page\",\"region\", \"line\"], \"default\": \"page\",\"description\": \"PAGE XML hierarchy level to operate on\"}\n      }\n    },\n    \"ocrd-anybaseocr-crop\": {\n      \"executable\": \"ocrd-anybaseocr-crop\",\n      \"description\": \"Image crop using non-linear processing\",\n      \"categories\": [\"Image preprocessing\"],\n      \"steps\": [\"preprocessing/optimization/cropping\"],\n      \"input_file_grp\": [\"OCR-D-IMG-DESKEW\"],\n      \"output_file_grp\": [\"OCR-D-IMG-CROP\"],\n      \"parameters\": {\n        \"colSeparator\":  {\"type\": \"number\", \"format\": \"float\", \"default\": 0.04, \"description\": \"consider space between column. 25% of width\"},\n        \"maxRularArea\":  {\"type\": \"number\", \"format\": \"float\", \"default\": 0.3, \"description\": \"Consider maximum rular area\"},\n        \"minArea\":       {\"type\": \"number\", \"format\": \"float\", \"default\": 0.05, \"description\": \"rular position in below\"},\n        \"minRularArea\":  {\"type\": \"number\", \"format\": \"float\", \"default\": 0.01, \"description\": \"Consider minimum rular area\"},\n        \"positionBelow\": {\"type\": \"number\", \"format\": \"float\", \"default\": 0.75, \"description\": \"rular position in below\"},\n        \"positionLeft\":  {\"type\": \"number\", \"format\": \"float\", \"default\": 0.4, \"description\": \"rular position in left\"},\n        \"positionRight\": {\"type\": \"number\", \"format\": \"float\", \"default\": 0.6, \"description\": \"rular position in right\"},\n        \"rularRatioMax\": {\"type\": \"number\", \"format\": \"float\", \"default\": 10.0, \"description\": \"rular position in below\"},\n        \"rularRatioMin\": {\"type\": \"number\", \"format\": \"float\", \"default\": 3.0, \"description\": \"rular position in below\"},\n        \"rularWidth\":    {\"type\": \"number\", \"format\": \"float\", \"default\": 0.95, \"description\": \"maximum rular width\"},\n        \"operation_level\": {\"type\": \"string\", \"enum\": [\"page\",\"region\", \"line\"], \"default\": \"page\",\"description\": \"PAGE XML hierarchy level to operate on\"}\n      }\n    },\n    \"ocrd-anybaseocr-dewarp\": {\n      \"executable\": \"ocrd-anybaseocr-dewarp\",\n      \"description\": \"dewarp image with anyBaseOCR\",\n      \"categories\": [\"Image preprocessing\"],\n      \"steps\": [\"preprocessing/optimization/dewarping\"],\n      \"input_file_grp\": [\"OCR-D-IMG-CROP\"],\n      \"output_file_grp\": [\"OCR-D-IMG-DEWARP\"],\n      \"parameters\": {\n        \"imgresize\":    { \"type\": \"string\",                      \"default\": \"resize_and_crop\", \"description\": \"run on original size image\"},\n        \"pix2pixHD\":    { \"type\": \"string\",                      \"required\": true, \"description\": \"Path to pix2pixHD library\"},\n        \"gpu_id\":       { \"type\": \"number\", \"format\": \"integer\", \"default\": 0,    \"description\": \"gpu id\"},\n        \"resizeHeight\": { \"type\": \"number\", \"format\": \"integer\", \"default\": 1024, \"description\": \"resized image height\"},\n        \"resizeWidth\":  { \"type\": \"number\", \"format\": \"integer\", \"default\": 1024, \"description\": \"resized image width\"}\n      }\n    },\n    \"ocrd-anybaseocr-tiseg\": {\n      \"executable\": \"ocrd-anybaseocr-tiseg\",\n      \"input_file_grp\": [\"OCR-D-IMG-CROP\"],\n      \"output_file_grp\": [\"OCR-D-SEG-TISEG\"],\n      \"categories\": [\"Layout analysis\"],\n      \"steps\": [\"layout/segmentation/text-image\"],\n      \"description\": \"separate text and non-text part with anyBaseOCR\",\n      \"parameters\": {\n      }\n    },\n    \"ocrd-anybaseocr-textline\": {\n      \"executable\": \"ocrd-anybaseocr-textline\",\n      \"input_file_grp\": [\"OCR-D-SEG-TISEG\"],\n      \"output_file_grp\": [\"OCR-D-SEG-LINE-ANY\"],\n      \"categories\": [\"Layout analysis\"],\n      \"steps\": [\"layout/segmentation/line\"],\n      \"description\": \"separate each text line\",\n      \"parameters\": {\n        \"minscale\":    {\"type\": \"number\", \"format\": \"float\", \"default\": 12.0, \"description\": \"minimum scale permitted\"},\n        \"maxlines\":    {\"type\": \"number\", \"format\": \"float\", \"default\": 300, \"description\": \"non-standard scaling of horizontal parameters\"},\n        \"scale\":       {\"type\": \"number\", \"format\": \"float\", \"default\": 0.0, \"description\": \"the basic scale of the document (roughly, xheight) 0=automatic\"},\n        \"hscale\":      {\"type\": \"number\", \"format\": \"float\", \"default\": 1.0, \"description\": \"non-standard scaling of horizontal parameters\"},\n        \"vscale\":      {\"type\": \"number\", \"format\": \"float\", \"default\": 1.7, \"description\": \"non-standard scaling of vertical parameters\"},\n        \"threshold\":   {\"type\": \"number\", \"format\": \"float\", \"default\": 0.2, \"description\": \"baseline threshold\"},\n        \"noise\":       {\"type\": \"number\", \"format\": \"integer\", \"default\": 8, \"description\": \"noise threshold for removing small components from lines\"},\n        \"usegauss\":    {\"type\": \"boolean\", \"default\": false, \"description\": \"use gaussian instead of uniform\"},\n        \"maxseps\":     {\"type\": \"number\", \"format\": \"integer\", \"default\": 2, \"description\": \"maximum black column separators\"},\n        \"sepwiden\":    {\"type\": \"number\", \"format\": \"integer\", \"default\": 10, \"description\": \"widen black separators (to account for warping)\"},\n        \"blackseps\":   {\"type\": \"boolean\", \"default\": false, \"description\": \"also check for black column separators\"},\n        \"maxcolseps\":  {\"type\": \"number\", \"format\": \"integer\", \"default\": 2, \"description\": \"maximum # whitespace column separators\"},\n        \"csminaspect\": {\"type\": \"number\", \"format\": \"float\", \"default\": 1.1, \"description\": \"minimum aspect ratio for column separators\"},\n        \"csminheight\": {\"type\": \"number\", \"format\": \"float\", \"default\": 6.5, \"description\": \"minimum column height (units=scale)\"},\n        \"pad\":         {\"type\": \"number\", \"format\": \"integer\", \"default\": 3, \"description\": \"padding for extracted lines\"},\n        \"expand\":      {\"type\": \"number\", \"format\": \"integer\", \"default\": 3, \"description\": \"expand mask for grayscale extraction\"},\n        \"parallel\":    {\"type\": \"number\", \"format\": \"integer\", \"default\": 0, \"description\": \"number of CPUs to use\"},\n        \"libpath\":     {\"type\": \"string\", \"default\": \".\", \"description\": \"Library Path for C Executables\"}\n      }\n    },\n    \"ocrd-anybaseocr-layout-analysis\": {\n      \"executable\": \"ocrd-anybaseocr-layout-analysis\",\n      \"input_file_grp\": [\"OCR-D-IMG-CROP\"],\n      \"output_file_grp\": [\"OCR-D-SEG-LAYOUT\"],\n      \"categories\": [\"Layout analysis\"],\n      \"steps\": [\"layout/segmentation/text-image\"],\n      \"description\": \"Analysis of the input document\",\n      \"parameters\": {\n        \"batch_size\":         {\"type\": \"number\", \"format\": \"integer\", \"default\": 4, \"description\": \"Batch size for generating test images\"},\n        \"model_path\":         { \"type\": \"string\",                     \"required\": true, \"description\": \"Path to Layout Structure Classification Model\"},\n        \"class_mapping_path\": { \"type\": \"string\",                     \"required\": true, \"description\": \"Path to Layout Structure Classes\"}\n      }\n    },\n    \"ocrd-anybaseocr-block-segmentation\": {\n      \"executable\": \"ocrd-anybaseocr-block-segmentation\",\n      \"input_file_grp\": [\"OCR-D-IMG\"],\n      \"output_file_grp\": [\"OCR-D-BLOCK-SEGMENT\"],\n      \"categories\": [\"Layout analysis\"],\n      \"steps\": [\"layout/segmentation/text-image\"],\n      \"description\": \"Analysis of the input document\",\n      \"parameters\": {        \n        \"block_segmentation_model\":   { \"type\": \"string\",                     \"required\": true, \"description\": \"Path to Layout Structure Classification Model\"},\n        \"block_segmentation_weights\": { \"type\": \"string\",                     \"required\": true, \"description\": \"Path to Layout Structure Classes\"}\n      }\n    }       \n  }\n}\n",
+            "setup.py": "# -*- coding: utf-8 -*-\nfrom setuptools import setup, find_packages\n\nsetup(\n    name='ocrd-anybaseocr',\n    version='v0.0.1',\n    author=\"DFKI\",\n    author_email=\"Saqib.Bukhari@dfki.de, Mohammad_mohsin.reza@dfki.de\",\n    url=\"https://github.com/mjenckel/LAYoutERkennung\",\n    license='Apache License 2.0',\n    long_description=open('README.md').read(),\n    long_description_content_type='text/markdown',\n    install_requires=open('requirements.txt').read().split('\\n'),\n    packages=find_packages(exclude=[\"work_dir\", \"src\"]),\n    package_data={\n        '': ['*.json']\n    },\n    entry_points={\n        'console_scripts': [\n            'ocrd-anybaseocr-binarize           = ocrd_anybaseocr.cli.cli:ocrd_anybaseocr_binarize',\n            'ocrd-anybaseocr-deskew             = ocrd_anybaseocr.cli.cli:ocrd_anybaseocr_deskew',\n            'ocrd-anybaseocr-crop               = ocrd_anybaseocr.cli.cli:ocrd_anybaseocr_cropping',        \n            'ocrd-anybaseocr-dewarp             = ocrd_anybaseocr.cli.cli:ocrd_anybaseocr_dewarp',\n            'ocrd-anybaseocr-tiseg              = ocrd_anybaseocr.cli.cli:ocrd_anybaseocr_tiseg',\n            'ocrd-anybaseocr-textline           = ocrd_anybaseocr.cli.cli:ocrd_anybaseocr_textline',\n            'ocrd-anybaseocr-layout-analysis    = ocrd_anybaseocr.cli.cli:ocrd_anybaseocr_layout_analysis',\n            'ocrd-anybaseocr-block-segmentation = ocrd_anybaseocr.cli.cli:ocrd_anybaseocr_block_segmentation'\n        ]\n    },\n)\n"
+        },
+        "git": {
+            "last_commit": "Tue Oct 22 17:00:56 2019 +0200",
+            "number_of_commits": "75"
+        },
+        "name": "LAYoutERkennung",
+        "ocrd_tool": {
+            "git_url": "https://github.com/mjenckel/LAYoutERkennung/",
+            "tools": {
+                "ocrd-anybaseocr-binarize": {
+                    "categories": [
+                        "Image preprocessing"
+                    ],
+                    "description": "Binarize images with the algorithm from ocropy",
+                    "executable": "ocrd-anybaseocr-binarize",
+                    "input_file_grp": [
+                        "OCR-D-IMG"
+                    ],
+                    "output_file_grp": [
+                        "OCR-D-IMG-BIN"
+                    ],
+                    "parameters": {
+                        "bignore": {
+                            "default": 0.1,
+                            "description": "ignore this much of the border for threshold estimation",
+                            "format": "float",
+                            "type": "number"
+                        },
+                        "debug": {
+                            "default": 0,
+                            "description": "display intermediate results",
+                            "format": "integer",
+                            "type": "number"
+                        },
+                        "escale": {
+                            "default": 1.0,
+                            "description": "scale for estimating a mask over the text region",
+                            "format": "float",
+                            "type": "number"
+                        },
+                        "gray": {
+                            "default": false,
+                            "description": "force grayscale processing even if image seems binary",
+                            "type": "boolean"
+                        },
+                        "hi": {
+                            "default": 90,
+                            "description": "percentile for white estimation",
+                            "format": "float",
+                            "type": "number"
+                        },
+                        "lo": {
+                            "default": 5,
+                            "description": "percentile for black estimation",
+                            "format": "float",
+                            "type": "number"
+                        },
+                        "nocheck": {
+                            "default": false,
+                            "description": "disable error checking on inputs",
+                            "type": "boolean"
+                        },
+                        "operation_level": {
+                            "default": "page",
+                            "description": "PAGE XML hierarchy level to operate on",
+                            "enum": [
+                                "page",
+                                "region",
+                                "line"
+                            ],
+                            "type": "string"
+                        },
+                        "perc": {
+                            "default": 80,
+                            "description": "percentage for filters",
+                            "format": "float",
+                            "type": "number"
+                        },
+                        "range": {
+                            "default": 20,
+                            "description": "range for filters",
+                            "format": "integer",
+                            "type": "number"
+                        },
+                        "raw_copy": {
+                            "default": false,
+                            "description": "also copy the raw image",
+                            "type": "boolean"
+                        },
+                        "show": {
+                            "default": false,
+                            "description": "display final results",
+                            "type": "boolean"
+                        },
+                        "threshold": {
+                            "default": 0.5,
+                            "description": "threshold, determines lightness",
+                            "format": "float",
+                            "type": "number"
+                        },
+                        "zoom": {
+                            "default": 0.5,
+                            "description": "zoom for page background estimation, smaller=faster",
+                            "format": "float",
+                            "type": "number"
+                        }
+                    },
+                    "steps": [
+                        "preprocessing/optimization/binarization"
+                    ]
+                },
+                "ocrd-anybaseocr-block-segmentation": {
+                    "categories": [
+                        "Layout analysis"
+                    ],
+                    "description": "Analysis of the input document",
+                    "executable": "ocrd-anybaseocr-block-segmentation",
+                    "input_file_grp": [
+                        "OCR-D-IMG"
+                    ],
+                    "output_file_grp": [
+                        "OCR-D-BLOCK-SEGMENT"
+                    ],
+                    "parameters": {
+                        "block_segmentation_model": {
+                            "description": "Path to Layout Structure Classification Model",
+                            "required": true,
+                            "type": "string"
+                        },
+                        "block_segmentation_weights": {
+                            "description": "Path to Layout Structure Classes",
+                            "required": true,
+                            "type": "string"
+                        }
+                    },
+                    "steps": [
+                        "layout/segmentation/text-image"
+                    ]
+                },
+                "ocrd-anybaseocr-crop": {
+                    "categories": [
+                        "Image preprocessing"
+                    ],
+                    "description": "Image crop using non-linear processing",
+                    "executable": "ocrd-anybaseocr-crop",
+                    "input_file_grp": [
+                        "OCR-D-IMG-DESKEW"
+                    ],
+                    "output_file_grp": [
+                        "OCR-D-IMG-CROP"
+                    ],
+                    "parameters": {
+                        "colSeparator": {
+                            "default": 0.04,
+                            "description": "consider space between column. 25% of width",
+                            "format": "float",
+                            "type": "number"
+                        },
+                        "maxRularArea": {
+                            "default": 0.3,
+                            "description": "Consider maximum rular area",
+                            "format": "float",
+                            "type": "number"
+                        },
+                        "minArea": {
+                            "default": 0.05,
+                            "description": "rular position in below",
+                            "format": "float",
+                            "type": "number"
+                        },
+                        "minRularArea": {
+                            "default": 0.01,
+                            "description": "Consider minimum rular area",
+                            "format": "float",
+                            "type": "number"
+                        },
+                        "operation_level": {
                             "default": "page",
                             "description": "PAGE XML hierarchy level to operate on",
                             "enum": [
@@ -1379,82 +2053,80 @@
     {
         "files": {
             "Dockerfile": null,
-            "README.md": "# cor-asv-fst\n    OCR post-correction with error/lexicon Finite State Transducers and\n    chararacter-level LSTM language models\n\n## Introduction\n\n\n## Installation\n\nRequired Ubuntu packages:\n\n* Python (``python`` or ``python3``)\n* pip (``python-pip`` or ``python3-pip``)\n* virtualenv (``python-virtualenv`` or ``python3-virtualenv``)\n\nCreate and activate a virtualenv as usual.\n\nTo install Python dependencies and this module, then do:\n```shell\nmake deps install\n```\nWhich is the equivalent of:\n```shell\npip install -r requirements.txt\npip install -e .\n```\n\nIn addition to the requirements listed in `requirements.txt`, the tool\nrequires the\n[pynini](http://www.opengrm.org/twiki/bin/view/GRM/Pynini)\nlibrary, which has to be installed from source.\n\n## Usage\n\nThe package has two user interfaces:\n\n### Command Line Interface\n\nThe package contains a suite of CLI tools to work with plaintext data (prefix:\n`cor-asv-fst-*`). The minimal working examples and data formats are described\nbelow. Additionally, each tool has further optional parameters - for a detailed\ndescription, call the tool with the `--help` option.\n\n#### `cor-asv-fst-train`\n\nTrain FST models. The basic invocation is as follows:\n\n```shell\ncor-asv-fst-train -l LEXICON_FILE -e ERROR_MODEL_FILE -t TRAINING_FILE\n```\n\nThis will create two transducers, which will be stored in `LEXICON_FILE` and\n`ERROR_MODEL_FILE`, respectively. As the training of the lexicon and the error\nmodel is done independently, any of them can be skipped by omitting the\nrespective parameter.\n\n`TRAINING_FILE` is a plain text file in tab-separated, two-column format\ncontaining a line of OCR-output and the corresponding ground truth line:\n\n```\n\u00bb Bergebt mir, da\u00df ih niht wei\u00df, wie\t\u00bbVergebt mir, da\u00df ich nicht wei\u00df, wie\naus dem (Gei\u017fte aller Nationen Mahrunq\taus dem Gei\u017fte aller Nationen Nahrung\nKann\u017ft Du mir die re<h\u00e9e Bahn nich\u00e9 zeigen ?\tKann\u017ft Du mir die rechte Bahn nicht zeigen?\nfrag zu bringen. \u2014\ttrag zu bringen. \u2014\n\u017fie ins irdij<he Leben hinein, Mit leichtem,\t\u017fie ins irdi\u017fche Leben hinein. Mit leichtem,\n```\n\nEach line is treated independently. Alternatively to the above, the training\ndata may also be supplied as two files:\n\n```shell\ncor-asv-fst-train -l LEXICON_FILE -e ERROR_MODEL_FILE -i INPUT_FILE -g GT_FILE\n```\n\nIn this variant, `INPUT_FILE` and `GT_FILE` are both in tab-separated,\ntwo-column format, in which the first column is the line ID and the second the\nline:\n\n```\n>=== INPUT_FILE ===<\nalexis_ruhe01_1852_0018_022     ih denke. Aber was die \u017felige Frau Geheimr\u00e4th1n\nalexis_ruhe01_1852_0035_019     \u201eDas fann ich niht, c\u2019esl absolument impos-\nalexis_ruhe01_1852_0087_027     rend. In dem Augenbli> war 1hr niht wohl zu\nalexis_ruhe01_1852_0099_012     \u00fcr die fle \u017fich \u017fchlugen.\u201c\nalexis_ruhe01_1852_0147_009     \u017follte. Nur \u00dcber die Familien, wo man \u017fie einf\u00fchren\n\n>=== GT_FILE ===<\nalexis_ruhe01_1852_0018_022     ich denke. Aber was die \u017felige Frau Geheimr\u00e4thin\nalexis_ruhe01_1852_0035_019     \u201eDas kann ich nicht, c'est absolument impos\u2014\nalexis_ruhe01_1852_0087_027     rend. Jn dem Augenblick war ihr nicht wohl zu\nalexis_ruhe01_1852_0099_012     f\u00fcr die \u017fie \u017fich \u017fchlugen.\u201c\nalexis_ruhe01_1852_0147_009     \u017follte. Nur \u00fcber die Familien, wo man \u017fie einf\u00fchren\n```\n\n#### `cor-asv-fst-process`\n\nThis tool applies a trained model to correct plaintext data on a line basis.\nThe basic invocation is:\n\n```shell\ncor-asv-fst-process -i INPUT_FILE -o OUTPUT_FILE -l LEXICON_FILE -e ERROR_MODEL_FILE (-m LM_FILE)\n```\n\n`INPUT_FILE` is in the same format as for the training procedure. `OUTPUT_FILE`\ncontains the post-correction results in the same format.\n\n`LM_FILE` is a `ocrd_keraslm` language model - if supplied, it is used for\nrescoring.\n\n#### `cor-asv-fst-evaluate`\n\nThis tool can be used to evaluate the post-correction results. The minimal\nworking invocation is:\n\n```shell\ncor-asv-fst-evaluate -i INPUT_FILE -o OUTPUT_FILE -g GT_FILE\n```\n\nAdditionally, the parameter `-M` can be used to select the evaluation measure\n(`Levenshtein` by default). The files should be in the same two-column format\nas described above.\n\n### [OCR-D processor](https://github.com/OCR-D/core) interface `ocrd-cor-asv-fst-process`\n\nTo be used with [PageXML](https://www.primaresearch.org/tools/PAGELibraries)\ndocuments in an [OCR-D](https://github.com/OCR-D/spec/) annotation workflow.\nInput could be anything with a textual annotation (`TextEquiv` on the given\n`textequiv_level`).\n\n...\n\n```json\n  \"tools\": {\n    \"cor-asv-fst-process\": {\n      \"executable\": \"cor-asv-fst-process\",\n      \"categories\": [\n        \"Text recognition and optimization\"\n      ],\n      \"steps\": [\n        \"recognition/post-correction\"\n      ],\n      \"description\": \"Improve text annotation by FST error and lexicon model with character-level LSTM language model\",\n      \"input_file_grp\": [\n        \"OCR-D-OCR-TESS\",\n        \"OCR-D-OCR-KRAK\",\n        \"OCR-D-OCR-OCRO\",\n        \"OCR-D-OCR-CALA\",\n        \"OCR-D-OCR-ANY\"\n      ],\n      \"output_file_grp\": [\n        \"OCR-D-COR-ASV\"\n      ],\n      \"parameters\": {\n        \"keraslm_file\": {\n          \"type\": \"string\",\n          \"format\": \"uri\",\n          \"content-type\": \"application/x-hdf;subtype=bag\",\n          \"description\": \"path of h5py weight/config file for language model trained with keraslm\",\n          \"required\": true,\n          \"cacheable\": true\n        },\n        \"errorfst_file\": {\n          \"type\": \"string\",\n          \"format\": \"uri\",\n          \"content-type\": \"application/vnd.openfst\",\n          \"description\": \"path of FST file for error model\",\n          \"required\": true,\n          \"cacheable\": true\n        },\n        \"lexiconfst_file\": {\n          \"type\": \"string\",\n          \"format\": \"uri\",\n          \"content-type\": \"application/vnd.openfst\",\n          \"description\": \"path of FST file for lexicon model\",\n          \"required\": true,\n          \"cacheable\": true\n        },\n        \"textequiv_level\": {\n          \"type\": \"string\",\n          \"enum\": [\"word\", \"glyph\"],\n          \"default\": \"glyph\",\n          \"description\": \"PAGE XML hierarchy level to read TextEquiv input on (output will always be word level)\"\n        },\n        \"beam_width\": {\n          \"type\": \"number\",\n          \"format\": \"integer\",\n          \"description\": \"maximum number of best partial paths to consider during beam search in language modelling\",\n          \"default\": 100\n        },\n        \"lm_weight\": {\n          \"type\": \"number\",\n          \"format\": \"float\",\n          \"description\": \"share of the LM scores over the FST output confidences\",\n          \"default\": 0.5\n        }\n      }\n    }\n  }\n```\n\n...\n\n## Testing\n\n...\n",
-            "ocrd-tool.json": null,
-            "setup.py": "# -*- coding: utf-8 -*-\n\"\"\"\nInstalls:\n    - cor-asv-fst-train\n    - cor-asv-fst-process\n    - cor-asv-fst-evaluate\n    - ocrd-cor-asv-fst-process\n\"\"\"\nimport codecs\n\nfrom setuptools import setup, find_packages\n\ninstall_requires = open('requirements.txt').read().split('\\n')\n\nwith codecs.open('README.md', encoding='utf-8') as f:\n    README = f.read()\n\nsetup(\n    name='ocrd_cor_asv_fst',\n    version='0.2.0',\n    description='OCR post-correction with error/lexicon Finite State '\n                'Transducers and character-level LSTMs',\n    long_description=README,\n    author='Maciej Sumalvico, Robert Sachunsky',\n    author_email='sumalvico@informatik.uni-leipzig.de, '\n                 'sachunsky@informatik.uni-leipzig.de',\n    url='https://github.com/ASVLeipzig/cor-asv-fst',\n    license='Apache License 2.0',\n    packages=find_packages(exclude=('tests', 'docs')),\n    install_requires=install_requires,\n    package_data={\n        '': ['*.json', '*.yml', '*.yaml'],\n    },\n    test_suite='tests',\n    entry_points={\n        'console_scripts': [\n            'cor-asv-fst-train=ocrd_cor_asv_fst.scripts.train:main',\n            'cor-asv-fst-process=ocrd_cor_asv_fst.scripts.process:main',\n            'cor-asv-fst-evaluate=ocrd_cor_asv_fst.scripts.evaluate:main',\n            'ocrd-cor-asv-fst-process=ocrd_cor_asv_fst.wrapper.cli:ocrd_cor_asv_fst',\n        ]\n    }\n)\n"
+            "README.md": "dinglehopper\n============\n\ndinglehopper is an OCR evaluation tool and reads [ALTO](https://github.com/altoxml), [PAGE](https://github.com/PRImA-Research-Lab/PAGE-XML) and text files.\n\n[![Build Status](https://travis-ci.org/qurator-spk/dinglehopper.svg?branch=master)](https://travis-ci.org/qurator-spk/dinglehopper)\n\nGoals\n-----\n* Useful\n  * As an UI tool\n  * For an automated evaluation\n  * As a library\n* Unicode support\n\nUsage\n-----\n~~~\ndinglehopper some-document.gt.page.xml some-document.ocr.alto.xml\n~~~\nThis generates `report.html` and `report.json`.\n\n\nAs a OCR-D processor:\n~~~\nocrd-dinglehopper -m mets.xml -I OCR-D-GT-PAGE,OCR-D-OCR-TESS -O OCR-D-OCR-TESS-EVAL\n~~~\nThis generates HTML and JSON reports in the `OCR-D-OCR-TESS-EVAL` filegroup.\n\n\n![dinglehopper displaying metrics and character differences](.screenshots/dinglehopper.png?raw=true)\n",
+            "ocrd-tool.json": "{\n  \"git_url\": \"https://github.com/qurator-spk/dinglehopper\",\n  \"tools\": {\n    \"ocrd-dinglehopper\": {\n      \"executable\": \"ocrd-dinglehopper\",\n      \"description\": \"Evaluate OCR text against ground truth with dinglehopper\",\n      \"input_file_grp\": [\n        \"OCR-D-GT-PAGE\",\n        \"OCR-D-OCR\"\n      ],\n      \"output_file_grp\": [\n        \"OCR-D-OCR-EVAL\"\n      ],\n      \"categories\": [\n        \"Quality assurance\"\n      ],\n      \"steps\": [\n        \"recognition/text-recognition\"\n      ]\n    }\n  }\n}\n",
+            "setup.py": "from io import open\nfrom setuptools import find_packages, setup\n\nwith open('requirements.txt') as fp:\n    install_requires = fp.read()\n\nsetup(\n    name='dinglehopper',\n    author='Mike Gerber, The QURATOR SPK Team',\n    author_email='mike.gerber@sbb.spk-berlin.de, qurator@sbb.spk-berlin.de',\n    description='The OCR evaluation tool',\n    long_description=open('README.md', 'r', encoding='utf-8').read(),\n    long_description_content_type='text/markdown',\n    keywords='qurator ocr',\n    license='Apache',\n    namespace_packages=['qurator'],\n    packages=find_packages(exclude=['*.tests', '*.tests.*', 'tests.*', 'tests']),\n    install_requires=install_requires,\n    package_data={\n        '': ['*.json', 'templates/*'],\n    },\n    entry_points={\n      'console_scripts': [\n        'dinglehopper=qurator.dinglehopper.cli:main',\n        'ocrd-dinglehopper=qurator.dinglehopper.ocrd_cli:ocrd_dinglehopper',\n      ]\n    }\n)\n"
         },
         "git": {
-            "last_commit": "Tue Jul 23 17:00:16 2019 +0200",
-            "number_of_commits": "1"
-        },
-        "name": "cor-asv-fst",
-        "ocrd_tool": "",
-        "ocrd_tool_validate": "NO ocrd-tool.json",
-        "org_plus_name": "ASVLeipzig/cor-asv-fst",
-        "python": {
-            "author": "Maciej Sumalvico, Robert Sachunsky",
-            "author-email": "sumalvico@informatik.uni-leipzig.de, sachunsky@informatik.uni-leipzig.de",
-            "name": "ocrd_cor_asv_fst",
-            "url": "https://github.com/ASVLeipzig/cor-asv-fst"
-        },
-        "url": "https://github.com/ASVLeipzig/cor-asv-fst"
-    },
-    {
-        "files": {
-            "Dockerfile": null,
-            "README.md": "# cor-asv-ann\n    OCR post-correction with encoder-attention-decoder LSTMs\n\n## Introduction\n\nThis is a tool for automatic OCR _post-correction_ (reducing optical character recognition errors) with recurrent neural networks. It uses sequence-to-sequence transduction on the _character level_ with a model architecture akin to neural machine translation, i.e. a stacked **encoder-decoder** network with attention mechanism. \n\nThe **attention model** always applies to full lines (in a _global_ configuration), and uses a linear _additive_ alignment model. (This transfers information between the encoder and decoder hidden layer states, and calculates a _soft alignment_ between input and output characters. It is imperative for character-level processing, because with a simple final-initial transfer, models tend to start \"forgetting\" the input altogether at some point in the line and behave like unconditional LM generators.)\n\n...FIXME: mention: \n- stacked architecture (with bidirectional bottom and attentional top), configurable depth/width\n- weight tying\n- underspecification and gap\n- confidence input and alternative input\n- CPU/GPU option\n- incremental training, LM transfer, shallow transfer\n- evaluation (CER, PPL)\n\n### Processing PAGE annotations\n\nWhen applied on PAGE-XML (as OCR-D workspace processor), this component also allows processing below the `TextLine` hierarchy level, i.e. on `Word` or `Glyph` level. For that it uses the soft alignment scores to calculate an optimal hard alignment path for characters, and thereby distributes the transduction onto the lower level elements (keeping their coordinates and other meta-data), while changing Word segmentation if necessary.\n\n...\n\n### Architecture\n\n...FIXME: show!\n\n### Input with confidence and/or alternatives\n\n...FIXME: explain!\n\n### Multi-OCR input\n\nnot yet!\n\n### Modes\n\nWhile the _encoder_ can always be run in parallel over a batch of lines and by passing the full sequence of characters in one tensor (padded to the longest line in the batch), which is very efficient with Keras backends like Tensorflow, a **beam-search** _decoder_ requires passing initial/final states character-by-character, with parallelism employed to capture multiple history hypotheses of a single line. However, one can also **greedily** use the best output only for each position (without beam search). And in doing so, another option is to feed back the softmax output directly into the decoder input instead of its argmax unit vector. This effectively passes the full probability distribution from state to state, which (not very surprisingly) can increase correction accuracy quite a lot \u2013 it can get as good as a medium-sized beam search results. This latter option also allows to run in parallel again, which is also much faster \u2013 consuming up to ten times less CPU time.\n\nThererfore, the backend function `lib.Sequence2Sequence.correct_lines` can operate the encoder-decoder network in either of the following modes:\n\n#### _fast_\n\nDecode greedily, but feeding back the full softmax distribution in batch mode.\n\n#### _greedy_\n\nDecode greedily, but feeding back the argmax unit vectors for each line separately.\n\n#### _default_\n\nDecode beamed, feeding back the argmax unit vectors for the best history/output hypotheses of each line. More specifically:\n\n> Start decoder with start-of-sequence, then keep decoding until\n> end-of-sequence is found or output length is way off, repeatedly.\n> Decode by using the best predicted output characters and several next-best\n> alternatives (up to some degradation threshold) as next input.\n> Follow-up on the N best overall candidates (estimated by accumulated\n> score, normalized by length and prospective cost), i.e. do A*-like\n> breadth-first search, with N equal `batch_size`.\n> Pass decoder initial/final states from character to character,\n> for each candidate respectively.\n> Reserve 1 candidate per iteration for running through `source_seq`\n> (as a rejection fallback) to ensure that path does not fall off the\n> beam and at least one solution can be found within the search limits.\n\n### Evaluation\n\nText lines can be compared (by aligning and computing a distance under some metric) across multiple inputs. (This would typically be GT and OCR vs post-correction.) This can be done both on plain text files (`cor-asv-ann-eval`) and PAGE-XML annotations (`ocrd-cor-asv-ann-evaluate`).\n\nThere are a number of distance metrics available:\n- `Levenshtein`: simple unweighted edit distance (fastest, standard)\n- `combining-e-umlauts`: like the former, but umlauts with combining letter e get smaller distance to precomposed umlauts (and vice versa), as in \"Wu\u0364\u017fte\" (as opposed to \"W\u00fc\u017fte\")\n- `historic_latin`: like the former, but with additional exceptions (i.e. zero distances) for certain (isolated) character confusions \u2013 roughly the difference between GT level 1 and 2\n- `NFC`: like `Levenshtein`, but apply Unicode normal form with canonical composition before (i.e. less than `historic_latin`)\n- `NFKC`: like `Levenshtein`, but apply Unicode normal form with compatibility composition before (i.e. more than `historic_latin`)\n\n\n## Installation\n\nRequired Ubuntu packages:\n\n* Python (``python`` or ``python3``)\n* pip (``python-pip`` or ``python3-pip``)\n* virtualenv (``python-virtualenv`` or ``python3-virtualenv``)\n\nCreate and activate a virtualenv as usual.\n\nTo install Python dependencies and this module, then do:\n```shell\nmake deps install\n```\nWhich is the equivalent of:\n```shell\npip install -r requirements.txt\npip install -e .\n```\n\n## Usage\n\nThis packages has the following user interfaces:\n\n### command line interface `cor-asv-ann-train`\n\nTo be used with string arguments and plain-text files.\n\n...\n\n### command line interface `cor-asv-ann-eval`\n\nTo be used with string arguments and plain-text files.\n\n...\n\n### command line interface `cor-asv-ann-repl`\n\ninteractive\n\n...\n\n### [OCR-D processor](https://github.com/OCR-D/core) interface `ocrd-cor-asv-ann-process`\n\nTo be used with [PageXML](https://www.primaresearch.org/tools/PAGELibraries) documents in an [OCR-D](https://github.com/OCR-D/spec/) annotation workflow. Input could be anything with a textual annotation (`TextEquiv` on the given `textequiv_level`). \n\n...\n\n```json\n    \"ocrd-cor-asv-ann-process\": {\n      \"executable\": \"ocrd-cor-asv-ann-process\",\n      \"categories\": [\n        \"Text recognition and optimization\"\n      ],\n      \"steps\": [\n        \"recognition/post-correction\"\n      ],\n      \"description\": \"Improve text annotation by character-level encoder-attention-decoder ANN model\",\n      \"input_file_grp\": [\n        \"OCR-D-OCR-TESS\",\n        \"OCR-D-OCR-KRAK\",\n        \"OCR-D-OCR-OCRO\",\n        \"OCR-D-OCR-CALA\",\n        \"OCR-D-OCR-ANY\"\n      ],\n      \"output_file_grp\": [\n        \"OCR-D-COR-ASV\"\n      ],\n      \"parameters\": {\n        \"model_file\": {\n          \"type\": \"string\",\n          \"format\": \"uri\",\n          \"content-type\": \"application/x-hdf;subtype=bag\",\n          \"description\": \"path of h5py weight/config file for model trained with cor-asv-ann-train\",\n          \"required\": true,\n          \"cacheable\": true\n        },\n        \"textequiv_level\": {\n          \"type\": \"string\",\n          \"enum\": [\"line\", \"word\", \"glyph\"],\n          \"default\": \"glyph\",\n          \"description\": \"PAGE XML hierarchy level to read/write TextEquiv input/output on\"\n        }\n      }\n    }\n```\n\n...\n\n### [OCR-D processor](https://github.com/OCR-D/core) interface `ocrd-cor-asv-ann-evaluate`\n\nTo be used with [PageXML](https://www.primaresearch.org/tools/PAGELibraries) documents in an [OCR-D](https://github.com/OCR-D/spec/) annotation workflow. Inputs could be anything with a textual annotation (`TextEquiv` on the line level), but at least 2. The first in the list of input file groups will be regarded as reference/GT.\n\n...\n\n```json\n    \"ocrd-cor-asv-ann-evaluate\": {\n      \"executable\": \"ocrd-cor-asv-ann-evaluate\",\n      \"categories\": [\n        \"Text recognition and optimization\"\n      ],\n      \"steps\": [\n        \"recognition/evaluation\"\n      ],\n      \"description\": \"Align different textline annotations and compute distance\",\n      \"parameters\": {\n        \"metric\": {\n          \"type\": \"string\",\n          \"enum\": [\"Levenshtein\", \"combining-e-umlauts\", \"NFC\", \"NFKC\", \"historic_latin\"],\n          \"default\": \"Levenshtein\",\n          \"description\": \"Distance metric to calculate and aggregate\"\n        }\n      }\n    }\n```\n\n...\n\n## Testing\n\nnot yet!\n...\n",
-            "ocrd-tool.json": null,
-            "setup.py": "# -*- coding: utf-8 -*-\n\"\"\"\nInstalls:\n    - cor-asv-ann-train\n    - cor-asv-ann-eval\n    - cor-asv-ann-repl\n    - ocrd-cor-asv-ann-process\n    - ocrd-cor-asv-ann-evaluate\n\"\"\"\nimport codecs\n\nfrom setuptools import setup, find_packages\n\ninstall_requires = open('requirements.txt').read().split('\\n')\n\nwith codecs.open('README.md', encoding='utf-8') as f:\n    README = f.read()\n\nsetup(\n    name='ocrd_cor_asv_ann',\n    version='0.1.1',\n    description='sequence-to-sequence translator for noisy channel error correction',\n    long_description=README,\n    author='Robert Sachunsky',\n    author_email='sachunsky@informatik.uni-leipzig.de',\n    url='https://github.com/ASVLeipzig/cor-asv-ann',\n    license='Apache License 2.0',\n    packages=find_packages(exclude=('tests', 'docs')),\n    install_requires=install_requires,\n    package_data={\n        '': ['*.json', '*.yml', '*.yaml'],\n    },\n    entry_points={\n        'console_scripts': [\n            'cor-asv-ann-train=ocrd_cor_asv_ann.scripts.train:cli',\n            'cor-asv-ann-eval=ocrd_cor_asv_ann.scripts.eval:cli',\n            'cor-asv-ann-repl=ocrd_cor_asv_ann.scripts.repl:cli',\n            'ocrd-cor-asv-ann-process=ocrd_cor_asv_ann.wrapper.cli:ocrd_cor_asv_ann_process',\n            'ocrd-cor-asv-ann-evaluate=ocrd_cor_asv_ann.wrapper.cli:ocrd_cor_asv_ann_evaluate',\n        ]\n    },\n)\n"
+            "last_commit": "Fri Oct 18 17:45:24 2019 +0200",
+            "number_of_commits": "32"
         },
-        "git": {
-            "last_commit": "Fri Jul 19 23:53:14 2019 +0200",
-            "number_of_commits": "1"
+        "name": "dinglehopper",
+        "ocrd_tool": {
+            "git_url": "https://github.com/qurator-spk/dinglehopper",
+            "tools": {
+                "ocrd-dinglehopper": {
+                    "categories": [
+                        "Quality assurance"
+                    ],
+                    "description": "Evaluate OCR text against ground truth with dinglehopper",
+                    "executable": "ocrd-dinglehopper",
+                    "input_file_grp": [
+                        "OCR-D-GT-PAGE",
+                        "OCR-D-OCR"
+                    ],
+                    "output_file_grp": [
+                        "OCR-D-OCR-EVAL"
+                    ],
+                    "steps": [
+                        "recognition/text-recognition"
+                    ]
+                }
+            }
         },
-        "name": "cor-asv-ann",
-        "ocrd_tool": "",
-        "ocrd_tool_validate": "NO ocrd-tool.json",
-        "org_plus_name": "ASVLeipzig/cor-asv-ann",
+        "ocrd_tool_validate": "<report valid=\"false\">\n  <error>[] 'version' is a required property</error>\n</report>",
+        "org_plus_name": "qurator-spk/dinglehopper",
         "python": {
-            "author": "Robert Sachunsky",
-            "author-email": "sachunsky@informatik.uni-leipzig.de",
-            "name": "ocrd_cor_asv_ann",
-            "url": "https://github.com/ASVLeipzig/cor-asv-ann"
+            "author": "Mike Gerber, The QURATOR SPK Team",
+            "author-email": "mike.gerber@sbb.spk-berlin.de, qurator@sbb.spk-berlin.de",
+            "name": "dinglehopper",
+            "url": "UNKNOWN"
         },
-        "url": "https://github.com/ASVLeipzig/cor-asv-ann"
+        "url": "https://github.com/qurator-spk/dinglehopper"
     },
     {
         "files": {
-            "Dockerfile": "FROM ocrd/core\nMAINTAINER Florian Fink <finkf@cis.lmu.de>\nENV OCRD_VERSION \"ocrd-0.1\"\nENV PROFILER_GIT https://github.com/cisocrgroup/Profiler\nENV LC_ALL C.UTF-8\nENV LANG C.UTF-8\n\nVOLUME [\"/data\"]\n# update\nRUN apt-get update && \\\n    apt-get install -y git cmake g++ libxerces-c-dev libcppunit-dev openjdk-8-jre\n\nENV VERSION \"2018-06-06\"\n# install profiler\nRUN mkdir /src && \\\n    cd /src && \\\n    git clone -b ocrd ${PROFILER_GIT} && \\\n    cd Profiler && mkdir build && cd build && \\\n    cmake -DCMAKE_BUILD_TYPE=release .. && \\\n    make -j 4 profiler && \\\n    mkdir /apps && \\\n    cp bin/profiler /apps/ && \\\n    cd / && \\\n\t\trm -rf /src/Profiler\nCOPY target/${OCRD_VERSION}-cli.jar /apps/\n\nENTRYPOINT [\"/bin/sh\", \"-c\"]\n#COPY target/${OCRD_VERSION}.war /usr/local/tomcat/webapps\n#COPY tomcat-users.xml ${CATALINA_HOME}/conf/tomcat-users.xml\n#COPY context.xml ${CATALINA_HOME}/webapps/manager/META-INF/context.xml\n#COPY context.xml ${CATALINA_HOME}/webapps/host-manager/META-INF/context.xml\n",
-            "README.md": "![build status](https://travis-ci.org/cisocrgroup/ocrd-postcorrection.svg?branch=dev)\nOCRD REST API\n====================\n\n\nEclipse/Maven project for OCRD Application Server REST API & UIF\n\n## prerequisites\n- JDK8 or better\n- Maven 3\n- Eclipse 4.X or better (installed maven plugin & e.g. Spring STS)\n\n\n## build, run & dev-steps\n\n### import to eclipse\n1. clone repo to your local machine\n2. Open Eclipse\n3. File->Import->Maven->existing Maven project\n4. Run from new projects's context menu Maven->Update project\n5. New Java code from RAML is generated in /target/generated-sources/raml-jaxrs\n\n\n### run local jetty dev-server\n\n#### steps:\n- run eclipse project on server: `RunLocalJetty.java`\n- Point your browser to http://localhost:8181\n    * `/` shows html & js pages which reside in `src/webapp`\n    * `/api/` points to all api endpoints\n\n\n#### API documentation:\n- install raml2html module (https://github.com/raml2html/raml2html)\n- to generate use: raml2html api.raml > api.html\n\n\n\nOCRD WEB APP\n====================\n\n- sources in webapp-src folder\n\n### contents\nall source files which are needed to build the webapp (e.g. backbone js code & HTML markup code)\n\n### build\nuse build scripts to create a build. The build-script shall sync the build to `src/webapp`\n",
+            "Dockerfile": null,
+            "README.md": "# pixelwise_segmentation_SBB\nThis is a tool for pixel wise segmentation. This is developed in order to do use cases like page extraction, textline , word \nand structure recognition of library documents.\n\nDownload the repository and after you prepared your data, just run:\n\npython train.py with config_params.json\n",
             "ocrd-tool.json": null,
             "setup.py": null
         },
         "git": {
-            "last_commit": "Wed Jul 11 10:05:04 2018 +0200",
-            "number_of_commits": "1"
+            "last_commit": "Wed Jul 10 12:30:57 2019 +0200",
+            "number_of_commits": "6"
         },
-        "name": "ocrd-postcorrection",
+        "name": "pixelwise_segmentation_SBB",
         "ocrd_tool": "",
         "ocrd_tool_validate": "NO ocrd-tool.json",
-        "org_plus_name": "cisocrgroup/ocrd-postcorrection",
+        "org_plus_name": "qurator-spk/pixelwise_segmentation_SBB",
         "python": {
             "author": "",
             "author-email": "",
             "name": "",
             "url": ""
         },
-        "url": "https://github.com/cisocrgroup/ocrd-postcorrection"
+        "url": "https://github.com/qurator-spk/pixelwise_segmentation_SBB"
     },
     {
         "files": {
             "Dockerfile": null,
             "README.md": "# ocrd_typegroups_classifier\n\n> Typegroups classifier for OCR\n\n## Quick setup\n\nIf needed, create a virtual environment for Python 3 (it was tested\nsuccessfully with Python 3.7), activate it, and install ocrd.\n\n```sh\nvirtualenv -p python3 ocrd-venv3\nsource ocrd-venv3/bin/activate\npip3 install ocrd\n```\n\nEnter in the folder containing the tool:\n```cd ocrd_typegroups_classifier/```\n\nInstall the module and its dependencies\n\n```\nmake install\nmake deps\n```\n\nFinally, run the test:\n\n```\nsh test/test.sh\n```\n\n** Important: ** The test makes sure that the system does work. For\nspeed reasons, a very small neural network is used and applied only to\nthe top-left corner of the image, therefore the quality of the results\nwill be of poor quality.\n\n## Models\n\nThe model classifier-1.tgc is based on a ResNet-18, with less neurons\nper layer than the usual model. It was briefly trained on 12 classes:\nAdornment, Antiqua, Bastarda, Book covers and other irrelevant data,\nEmpty Pages, Fraktur, Griechisch, Hebr\u00e4isch, Kursiv, Rotunda, Textura,\nand Woodcuts - Engravings.\n\n## Heatmap Generation ##\nGiven a trained model, it is possible to produce heatmaps corresponding\nto classification results. Syntax:\n\n```\npython3 tools/heatmap.py ocrd_typegroups_classifier/models/classifier.tgc sample.jpg out\n```",
             "ocrd-tool.json": "{\n  \"version\": \"0.0.1\",\n  \"git_url\": \"https://github.com/seuretm/ocrd_typegroups_classifier\",\n  \"tools\": {\n    \"ocrd-typegroups-classifier\": {\n      \"executable\": \"ocrd-typegroups-classifier\",\n      \"description\": \"Classification of 15th century type groups\",\n      \"categories\": [\n        \"Text recognition and optimization\"\n      ],\n      \"steps\": [\n        \"recognition/font-identification\"\n      ],\n      \"parameters\": {\n        \"network\": {\n          \"description\": \"The file name of the neural network to use, including sufficient path information\",\n          \"type\": \"string\",\n          \"required\": true\n        },\n        \"stride\": {\n          \"description\": \"Stride applied to the CNN on the image. Should be between 1 and 224. Smaller values increase the computation time.\",\n          \"type\": \"number\",\n          \"format\": \"integer\",\n          \"default\": 112\n        }\n      }\n    }\n  }\n}\n",
-            "setup.py": "# -*- coding: utf-8 -*-\nimport codecs\n\nfrom setuptools import setup, find_packages\n\nwith codecs.open('README.md', encoding='utf-8') as f:\n    README = f.read()\n\nsetup(\n    name='ocrd_typegroups_classifier',\n    version='0.0.1',\n    description='Typegroups classifier for OCR',\n    long_description=README,\n    long_description_content_type='text/markdown',\n    author='Matthias Seuret, Konstantin Baierer',\n    author_email='seuretm@users.noreply.github.com',\n    url='https://github.com/seuretm/ocrd_typegroups_classifier',\n    license='Apache License 2.0',\n    packages=find_packages(exclude=('tests', 'docs')),\n    include_package_data=True,\n    install_requires=[\n        'click',\n        'ocrd >= 1.0.0b7',\n        'pandas',\n        'Pillow >= 5.3.0',\n        'scikit-image',\n        'torch',\n        'torchvision',\n    ],\n    package_data={\n        '': ['*.json', '*.tgc'],\n    },\n    entry_points={\n        'console_scripts': [\n            'typegroups-classifier=ocrd_typegroups_classifier.cli.simple:cli',\n            'ocrd-typegroups-classifier=ocrd_typegroups_classifier.cli.ocrd_cli:cli',\n        ]\n    },\n)\n"
+            "setup.py": "# -*- coding: utf-8 -*-\nimport codecs\n\nfrom setuptools import setup, find_packages\n\nwith codecs.open('README.md', encoding='utf-8') as f:\n    README = f.read()\n\nsetup(\n    name='ocrd_typegroups_classifier',\n    version='0.0.1',\n    description='Typegroups classifier for OCR',\n    long_description=README,\n    long_description_content_type='text/markdown',\n    author='Matthias Seuret, Konstantin Baierer',\n    author_email='seuretm@users.noreply.github.com',\n    url='https://github.com/seuretm/ocrd_typegroups_classifier',\n    license='Apache License 2.0',\n    packages=find_packages(exclude=('tests', 'docs')),\n    include_package_data=True,\n    install_requires=[\n        'click',\n        'ocrd >= 1.0.0b7',\n        'pandas',\n        'Pillow == 5.4.1',\n        'scikit-image',\n        'torch >= 1.2.0',\n        'torchvision',\n    ],\n    package_data={\n        '': ['*.json', '*.tgc'],\n    },\n    entry_points={\n        'console_scripts': [\n            'typegroups-classifier=ocrd_typegroups_classifier.cli.simple:cli',\n            'ocrd-typegroups-classifier=ocrd_typegroups_classifier.cli.ocrd_cli:cli',\n        ]\n    },\n)\n"
         },
         "git": {
-            "last_commit": "Tue Apr 30 16:09:15 2019 +0200",
-            "number_of_commits": "47"
+            "last_commit": "Fri Sep 6 11:52:17 2019 +0200",
+            "number_of_commits": "67"
         },
         "name": "ocrd_typegroups_classifier",
         "ocrd_tool": {
@@ -1486,7 +2158,7 @@
             },
             "version": "0.0.1"
         },
-        "ocrd_tool_validate": "<report valid=\"false\">\n  <error>[tools.ocrd-typegroups-classifier] 'input_file_grp' is a required property</error>\n  <error>[tools.ocrd-typegroups-classifier] 'output_file_grp' is a required property</error>\n</report>",
+        "ocrd_tool_validate": "<report valid=\"false\">\n  <error>[tools.ocrd-typegroups-classifier] 'input_file_grp' is a required property</error>\n</report>",
         "org_plus_name": "seuretm/ocrd_typegroups_classifier",
         "python": {
             "author": "Matthias Seuret, Konstantin Baierer",
@@ -1495,49 +2167,5 @@
             "url": "https://github.com/seuretm/ocrd_typegroups_classifier"
         },
         "url": "https://github.com/seuretm/ocrd_typegroups_classifier"
-    },
-    {
-        "files": {
-            "Dockerfile": null,
-            "README.md": "dinglehopper\n============\n\ndinglehopper is an OCR evaluation tool and reads [ALTO](https://github.com/altoxml), [PAGE](https://github.com/PRImA-Research-Lab/PAGE-XML) and text files.\n\n[![Build Status](https://travis-ci.org/qurator-spk/dinglehopper.svg?branch=master)](https://travis-ci.org/qurator-spk/dinglehopper)\n\nGoals\n-----\n* Useful\n  * As an UI tool\n  * For an automated evaluation\n  * As a library\n* Unicode support\n\nUsage\n-----\n~~~\ndinglehopper some-document.gt.page.xml some-document.ocr.alto.xml\n~~~\nThis generates `report.html` and `report.json`.\n\n\nAs a OCR-D processor:\n~~~\nocrd-dinglehopper -m mets.xml -I OCR-D-GT-PAGE,OCR-D-OCR-TESS -O OCR-D-OCR-TESS-EVAL\n~~~\nThis generates HTML and JSON reports in the `OCR-D-OCR-TESS-EVAL` filegroup.\n\n\n![dinglehopper displaying metrics and character differences](.screenshots/dinglehopper.png?raw=true)\n",
-            "ocrd-tool.json": "{\n  \"git_url\": \"https://github.com/qurator-spk/dinglehopper\",\n  \"tools\": {\n    \"ocrd-dinglehopper\": {\n      \"executable\": \"ocrd-dinglehopper\",\n      \"description\": \"Evaluate OCR text against ground truth with dinglehopper\",\n      \"input_file_grp\": [\n        \"OCR-D-GT-PAGE\",\n        \"OCR-D-OCR\"\n      ],\n      \"output_file_grp\": [\n        \"OCR-D-OCR-EVAL\"\n      ],\n      \"categories\": [\n        \"Quality assurance\"\n      ],\n      \"steps\": [\n        \"recognition/text-recognition\"\n      ]\n    }\n  }\n}\n",
-            "setup.py": "from io import open\nfrom setuptools import find_packages, setup\n\nwith open('requirements.txt') as fp:\n    install_requires = fp.read()\n\nsetup(\n    name='dinglehopper',\n    author='Mike Gerber, The QURATOR SPK Team',\n    author_email='mike.gerber@sbb.spk-berlin.de, qurator@sbb.spk-berlin.de',\n    description='The OCR evaluation tool',\n    long_description=open('README.md', 'r', encoding='utf-8').read(),\n    long_description_content_type='text/markdown',\n    keywords='qurator ocr',\n    license='Apache',\n    namespace_packages=['qurator'],\n    packages=find_packages(exclude=['*.tests', '*.tests.*', 'tests.*', 'tests']),\n    install_requires=install_requires,\n    package_data={\n        '': ['*.json', 'templates/*'],\n    },\n    entry_points={\n      'console_scripts': [\n        'dinglehopper=qurator.dinglehopper.cli:main',\n        'ocrd-dinglehopper=qurator.dinglehopper.ocrd_cli:ocrd_dinglehopper',\n      ]\n    }\n)\n"
-        },
-        "git": {
-            "last_commit": "Fri Oct 18 17:45:24 2019 +0200",
-            "number_of_commits": "1"
-        },
-        "name": "dinglehopper",
-        "ocrd_tool": {
-            "git_url": "https://github.com/qurator-spk/dinglehopper",
-            "tools": {
-                "ocrd-dinglehopper": {
-                    "categories": [
-                        "Quality assurance"
-                    ],
-                    "description": "Evaluate OCR text against ground truth with dinglehopper",
-                    "executable": "ocrd-dinglehopper",
-                    "input_file_grp": [
-                        "OCR-D-GT-PAGE",
-                        "OCR-D-OCR"
-                    ],
-                    "output_file_grp": [
-                        "OCR-D-OCR-EVAL"
-                    ],
-                    "steps": [
-                        "recognition/text-recognition"
-                    ]
-                }
-            }
-        },
-        "ocrd_tool_validate": "<report valid=\"false\">\n  <error>[] 'version' is a required property</error>\n</report>",
-        "org_plus_name": "qurator-spk/dinglehopper",
-        "python": {
-            "author": "Mike Gerber, The QURATOR SPK Team",
-            "author-email": "mike.gerber@sbb.spk-berlin.de, qurator@sbb.spk-berlin.de",
-            "name": "dinglehopper",
-            "url": "UNKNOWN"
-        },
-        "url": "https://github.com/qurator-spk/dinglehopper"
     }
 ]
-- 
GitLab