Skip to content
Snippets Groups Projects
Unverified Commit 7ad3724a authored by Paul Pestov's avatar Paul Pestov Committed by GitHub
Browse files

docs: improve README


* Update README.md

* docs: add Docker Compose plugin as requirement

* docs: use docker compose

---------

Co-authored-by: default avatarMichelle Weidling <weidling@sub.uni-goettingen.de>
parent 18efd08a
No related branches found
No related tags found
1 merge request!1Merge GitHub's state
...@@ -7,4 +7,5 @@ work/ ...@@ -7,4 +7,5 @@ work/
workflows/nf-results/* workflows/nf-results/*
workflows/results workflows/results
workflows/ocrd-workflows/*.nf workflows/ocrd-workflows/*.nf
models models
\ No newline at end of file .idea
\ No newline at end of file
# QuiVer Benchmarks # QuiVer Benchmarks
This repository holds everything you need for automatically executing different OCR-D workflows on images and evaluating the outcomes. This repository holds everything you need to automatically execute different OCR-D workflows on images and evaluate the outcomes.
It creates benchmarks for (your) OCR-D data in a containerized environment. It creates benchmarks for (your) OCR-D data in a containerized environment.
You can run QuiVer Benchmarks either locally on your machine or in an automized workflow, e.g. in a CI/CD environment. You can run QuiVer Benchmarks either locally on your machine or in an automated workflow, e.g. in a CI/CD environment.
QuiVer Benchmarks is based on `ocrd/all:maximum` and has all OCR-D processors at hand that a workflow might use. QuiVer Benchmarks is based on `ocrd/all:maximum` and has all OCR-D processors at hand that a workflow might use.
## Prerequisites ## Requirements
- Docker >= 23.0.0 - Docker >= 23.0.0
- [Docker Compose plugin](https://docs.docker.com/compose/install/linux/#install-using-the-repository)
To speed up QuiVer Benchmarks you can mount already downloaded text recognition models to `/usr/local/share/ocrd-resources/` in `docker-compose.yml` by adding To speed up QuiVer Benchmarks you can mount already downloaded text recognition models to `/usr/local/share/ocrd-resources/` in `docker compose.yml` by adding
```yml ```yml
- path/to/your/models:/usr/local/share/ocrd-resources/ - path/to/your/models:/usr/local/share/ocrd-resources/
``` ```
to the `volumes` section. to the `volumes` section.
Otherwise the tool will download all `ocrd-tesserocr-recognize` models as well as `ocrd-calamari-recognize qurator-gt4histocr-1.0` on each run. Otherwise, the tool will download all `ocrd-tesserocr-recognize` models as well as `ocrd-calamari-recognize qurator-gt4histocr-1.0` on each run.
## Usage ## Usage
- clone this repository - clone this repository
- [customize](#custom-workflows-and-data) QuiVer Benchmarks according to your needs - (optional) [customize](#custom-workflows-and-data) QuiVer Benchmarks according to your needs
- run `docker compose build && docker compose up` - run `docker compose up --build`
- the benchmarks and the evaluation results will be available at `data/workflows.json` on your host system - the benchmarks and the evaluation results will be available at `data/workflows.json` on your host system
## Benchmarks Considered ## Benchmarks Considered
The relevant benchmarks gathed by QuiVer Benchmarks are defined in [OCR-D's Quality Assurance specification](https://ocr-d.de/en/spec/eval) and comprise The relevant benchmarks gathered by QuiVer Benchmarks are defined in [OCR-D's Quality Assurance specification](https://ocr-d.de/en/spec/eval) and comprise
- CER (per page and document wide), incl. - CER (per page and document wide), incl.
- median - median
...@@ -41,7 +42,7 @@ The relevant benchmarks gathed by QuiVer Benchmarks are defined in [OCR-D's Qual ...@@ -41,7 +42,7 @@ The relevant benchmarks gathed by QuiVer Benchmarks are defined in [OCR-D's Qual
## Custom Workflows and Data ## Custom Workflows and Data
The default behaviour of QuiVer Benchmarks is to collect OCR-D's sample Ground Truth workspaces (currently stored in [quiver-data](https://github.com/OCR-D/quiver-data)), executing the [recommended standard workflows](https://ocr-d.de/en/workflows#recommendations) on these and obtaining the relevant [benchmarks](#benchmarks-considered) for each workflow. The default behaviour of QuiVer Benchmarks is to collect OCR-D's sample Ground Truth workspaces (currently stored in [quiver-data](https://github.com/OCR-D/quiver-data)), execute the [recommended standard workflows](https://ocr-d.de/en/workflows#recommendations) on these and obtain the relevant [benchmarks](#benchmarks-considered) for each workflow.
You can, however, customize QuiVer Benchmarks to run your own workflows on the sample workspaces or your own OCR-D workspaces. You can, however, customize QuiVer Benchmarks to run your own workflows on the sample workspaces or your own OCR-D workspaces.
...@@ -49,7 +50,7 @@ You can, however, customize QuiVer Benchmarks to run your own workflows on the s ...@@ -49,7 +50,7 @@ You can, however, customize QuiVer Benchmarks to run your own workflows on the s
Add new OCR-D workflows to the directory `workflows/ocrd_worflows` according to the following conventions: Add new OCR-D workflows to the directory `workflows/ocrd_worflows` according to the following conventions:
- OCR workflows have to end with `_ocr.txt`, evaluation workflows with `_eval.txt`. The files will be converted by OtoN to Nextflow files after the container has started. - OCR workflows have to end with `_ocr.txt`, evaluation workflows with `_eval.txt`. The files will be converted by [OtoN](https://github.com/MehmedGIT/OtoN_Converter) to Nextflow files after the container has started.
- workflows have to be TXT files - workflows have to be TXT files
- all workflows have to use [`ocrd process`](https://ocr-d.de/en/user_guide#ocrd-process) - all workflows have to use [`ocrd process`](https://ocr-d.de/en/user_guide#ocrd-process)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment